Skip to content
View sarahsair25's full-sized avatar

Block or report sarahsair25

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sarahsair25/README.md
Tech portfolio for AI and data

Sarah Sair

AI / GenAI Engineer & Data Scientist

Data Analytics • SQL • Python • RAG • LLM Apps • Machine Learning

Hi, I’m Sarah 👋

AI / GenAI Engineer & Data Scientist

Data Analytics | SQL | Python | RAG | LLM Apps | Machine Learning

I build AI and data systems that are structured, testable, and production-aware.

With 13+ years of technical systems experience, I approach AI, ML, and analytics with a strong engineering mindset — focusing on reliability, evaluation, performance, and clean data modeling.

My work spans:

  • Generative AI & RAG systems
  • Machine Learning pipelines
  • SQL-based analytics platforms
  • Data cleaning & performance modeling

I specialize in turning raw data and human intent into scalable, measurable solutions.


🤖 GenAI & LLM Engineering

I treat prompt engineering as a system design discipline — not trial and error.

Core Focus:

  • Prompt engineering (Zero-Shot, Few-Shot, Chain-of-Thought, ReAct)
  • Retrieval-Augmented Generation (RAG)
  • LLM evaluation & benchmarking
  • Guardrails & fallback logic
  • Structured prompt validation pipelines
  • API-based LLM integrations

📄 Featured Work: Prompt Engineering Case Study

A structured case study demonstrating how evaluation loops and guardrails improved LLM reliability and reduced hallucinations by 30–40%.

Tech: Python · OpenAI API · JSON · Regex · Evaluation Frameworks


📊 Data Analytics & SQL Projects

🧠 SQL Mentor — User Performance Analysis (PostgreSQL)

Built an end-to-end analytics pipeline to model user performance and engagement patterns.

  • Designed staging → clean schema
  • Developed leaderboard, streak, and rolling 7-day metrics
  • Modeled question difficulty (avg / median / negative-rate)
  • Optimized queries using indexing

Tech: PostgreSQL · Advanced SQL · Window Functions · KPI Modeling


🛒 SQL-Only E-Commerce Analytics Platform

Designed a complete SQL-based analytics system for transactional e-commerce data.

  • Built revenue, AOV, LTV, and retention metrics
  • Implemented cohort and trend analysis
  • Created reusable reporting views
  • Modeled business performance KPIs

Tech: PostgreSQL · SQL · Aggregations · Window Functions


📁 Sales Data Cleaning (SQL Server / SSMS)

Developed a structured data-cleaning pipeline to transform raw CSV sales data into analytics-ready datasets.

  • Implemented staging → clean workflow
  • Standardized inconsistent categorical fields
  • Recomputed missing/mismatched totals
  • Applied deduplication and validation logic

Tech: SQL Server · T-SQL · Data Cleaning · ETL Concepts


🧠 Machine Learning Projects

📊 Customer Churn Prediction

Built an end-to-end ML pipeline for churn prediction using telecom data.

  • Data cleaning & feature engineering
  • Model training & evaluation
  • Business-driven retention insights

Tech: Python · Pandas · scikit-learn · Classification


💳 Credit Card Fraud Detection

Developed a fraud detection model addressing severe class imbalance.

  • Precision/Recall optimization
  • Confusion matrix & F1-score evaluation
  • False positive minimization strategy

Tech: Python · Pandas · scikit-learn


🛠 Application Development

🤖 NLP Chatbot (Flask App)

  • Built a rule-based chatbot with TF-IDF & cosine similarity
  • Integrated REST API backend
  • Implemented preprocessing & lemmatization

Tech: Python · Flask · NLTK · HTML · CSS


🔧 Technical Stack

Languages

Python · SQL

Databases

PostgreSQL · SQL Server

AI / GenAI

LLMs · Prompt Engineering · RAG · OpenAI API · Evaluation Frameworks

Machine Learning

scikit-learn · Classification · Clustering · Model Evaluation

Data Analytics

Data Modeling · Window Functions · KPI Design · Cohort Analysis · ETL

Tools

Git · GitHub · Flask · REST APIs · Debugging


📈 Currently Building

  • Advanced RAG pipelines with evaluation scoring
  • ML monitoring & drift detection
  • SQL performance optimization techniques
  • Production-ready AI system design

I’m continuously learning, building, and refining practical AI and data systems.

Let’s connect if you're interested in structured AI engineering, clean data modeling, or analytics-driven system design.

🤝 Let's Connect


⭐ If you find my projects interesting, feel free to explore, fork, or star them!


📊 GitHub Stats

Pinned Loading

  1. Python-Voice-Assistant-Hands-Free-App-Launcher Python-Voice-Assistant-Hands-Free-App-Launcher Public

    A voice-controlled assistant built in Python that listens to natural speech commands and instantly opens websites or system applications — no keyboard needed.

    Python

  2. SQL-Mentor-User-Performance-Analysis-SQL-Power-BI SQL-Mentor-User-Performance-Analysis-SQL-Power-BI Public

    An end-to-end SQL analytics project analyzing learner submission behavior, scoring trends, engagement patterns, and achievement classification on the SQL Mentor platform.

  3. Student-Performance-Prediction-Sysytem Student-Performance-Prediction-Sysytem Public

    Built a multi-page Streamlit ML application with a dedicated model documentation page explaining features, metrics, and system design.

    Python

  4. AI-Powered-Chatbot-Transformers- AI-Powered-Chatbot-Transformers- Public

    An interactive command-line AI chatbot built using Hugging Face Transformers and the GPT-2 language model. This project demonstrates how transformer-based models can generate human-like text respon…

    Python

  5. Ola-Data-analyst-Project Ola-Data-analyst-Project Public

  6. Python-NLP-Chatbot Python-NLP-Chatbot Public

    A simple yet effective rule-based chatbot built using Python and NLTK.This project demonstrates the fundamentals of Natural Language Processing (NLP) by enabling conversational interactions through…

    Python