Master of Data Science student at UC Irvine (graduating Fall 2026), actively seeking data science internships.
I move across the stack, from statistical modeling, A/B testing, and machine learning to LLM applications, building in Python and R with a focus on turning analysis into decisions.
Statistics
- Cookie Cats A/B Testing — Frequentist and Bayesian A/B testing with retention and engagement analysis.
- Bike Sharing Demand Forecasting (Poisson) — Count regression with Poisson/NB2 and overdispersion diagnostics.
- Bike Sharing Demand Forecasting (OLS) — Demand forecasting with OLS, Ridge, Lasso and diagnostic analysis.
Machine Learning
- Two-Tower Retrieval for Recommendation — Two-Tower retrieval model for movie recommendations with systematic embedding tuning.
- Multi-Class Skill Classification in StarCraft II — StarCraft II player skill classification with class imbalance strategies.
- Customer Churn Prediction — Churn prediction with distribution shift analysis and model comparison.
RAG
- UCI Dataset Assistant (RAG) — RAG chatbot for UCI ML Repository dataset recommendations.
Dev Tools
- PR Description Generator — Turns git commits into professional PR descriptions.
- Expanded underrepresented scene data using GAN-based augmentation to reduce false positives in scene classification, improving precision by 5%.
- Applied PCA and t-SNE to visualize scene distribution gaps, revealing that a single general model struggled across different scene types; grouped scenes by distribution and fine-tuned per group, improving accuracy by 15%.
- Designed an AI agent pipeline using Microsoft AutoGen to automate end-to-end data processing and model training, replacing manual intervention between steps with a single-trigger workflow.
