Data Analytics • SQL • Python • RAG • LLM Apps • Machine Learning
Data Analytics | SQL | Python | RAG | LLM Apps | Machine Learning
I build AI and data systems that are structured, testable, and production-aware.
With 13+ years of technical systems experience, I approach AI, ML, and analytics with a strong engineering mindset — focusing on reliability, evaluation, performance, and clean data modeling.
My work spans:
- Generative AI & RAG systems
- Machine Learning pipelines
- SQL-based analytics platforms
- Data cleaning & performance modeling
I specialize in turning raw data and human intent into scalable, measurable solutions.
I treat prompt engineering as a system design discipline — not trial and error.
- Prompt engineering (Zero-Shot, Few-Shot, Chain-of-Thought, ReAct)
- Retrieval-Augmented Generation (RAG)
- LLM evaluation & benchmarking
- Guardrails & fallback logic
- Structured prompt validation pipelines
- API-based LLM integrations
A structured case study demonstrating how evaluation loops and guardrails improved LLM reliability and reduced hallucinations by 30–40%.
Tech: Python · OpenAI API · JSON · Regex · Evaluation Frameworks
Built an end-to-end analytics pipeline to model user performance and engagement patterns.
- Designed staging → clean schema
- Developed leaderboard, streak, and rolling 7-day metrics
- Modeled question difficulty (avg / median / negative-rate)
- Optimized queries using indexing
Tech: PostgreSQL · Advanced SQL · Window Functions · KPI Modeling
Designed a complete SQL-based analytics system for transactional e-commerce data.
- Built revenue, AOV, LTV, and retention metrics
- Implemented cohort and trend analysis
- Created reusable reporting views
- Modeled business performance KPIs
Tech: PostgreSQL · SQL · Aggregations · Window Functions
Developed a structured data-cleaning pipeline to transform raw CSV sales data into analytics-ready datasets.
- Implemented staging → clean workflow
- Standardized inconsistent categorical fields
- Recomputed missing/mismatched totals
- Applied deduplication and validation logic
Tech: SQL Server · T-SQL · Data Cleaning · ETL Concepts
Built an end-to-end ML pipeline for churn prediction using telecom data.
- Data cleaning & feature engineering
- Model training & evaluation
- Business-driven retention insights
Tech: Python · Pandas · scikit-learn · Classification
Developed a fraud detection model addressing severe class imbalance.
- Precision/Recall optimization
- Confusion matrix & F1-score evaluation
- False positive minimization strategy
Tech: Python · Pandas · scikit-learn
- Built a rule-based chatbot with TF-IDF & cosine similarity
- Integrated REST API backend
- Implemented preprocessing & lemmatization
Tech: Python · Flask · NLTK · HTML · CSS
Python · SQL
PostgreSQL · SQL Server
LLMs · Prompt Engineering · RAG · OpenAI API · Evaluation Frameworks
scikit-learn · Classification · Clustering · Model Evaluation
Data Modeling · Window Functions · KPI Design · Cohort Analysis · ETL
Git · GitHub · Flask · REST APIs · Debugging
- Advanced RAG pipelines with evaluation scoring
- ML monitoring & drift detection
- SQL performance optimization techniques
- Production-ready AI system design
I’m continuously learning, building, and refining practical AI and data systems.
Let’s connect if you're interested in structured AI engineering, clean data modeling, or analytics-driven system design.
- 🧠 GitHub: https://github.com/sarahsair25 LinkedIn:https://www.linkedin.com/in/sarahsair
⭐ If you find my projects interesting, feel free to explore, fork, or star them!