Skip to content
View ShengPeiWilliam's full-sized avatar

Highlights

  • Pro

Block or report ShengPeiWilliam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ShengPeiWilliam/README.md

Hi, I'm William

Master of Data Science student at UC Irvine (graduating Fall 2026), actively seeking data science internships.

I move across the stack, from statistical modeling, A/B testing, and machine learning to LLM applications, building in Python and R with a focus on turning analysis into decisions.

Portfolio Resume LinkedIn GitHub


🛠️ Projects

Statistics

Machine Learning

RAG

Dev Tools

💼 Experience

  • Expanded underrepresented scene data using GAN-based augmentation to reduce false positives in scene classification, improving precision by 5%.
  • Applied PCA and t-SNE to visualize scene distribution gaps, revealing that a single general model struggled across different scene types; grouped scenes by distribution and fine-tuned per group, improving accuracy by 15%.
  • Designed an AI agent pipeline using Microsoft AutoGen to automate end-to-end data processing and model training, replacing manual intervention between steps with a single-trigger workflow.

Pinned Loading

  1. bayesian-ab-testing bayesian-ab-testing Public

    A/B testing on Cookie Cats with Frequentist and Bayesian comparison across retention and engagement metrics.

    R

  2. telecom-churn-ml telecom-churn-ml Public

    End-to-end customer churn prediction in telecom using Logistic Regression and Random Forest, with EDA, feature selection, and actionable business insights.

    R

  3. movierec-two-towers movierec-two-towers Public

    A retrieval-stage movie recommendation system built with a Two-Tower model, trained on the MovieLens 100K dataset.

    Python

  4. skillcraft-ml skillcraft-ml Public

    Multinomial classification of StarCraft II player skill levels with class imbalance strategies.

    R

  5. bikerental-ml bikerental-ml Public

    Demand forecasting for bike rentals using OLS, Ridge, and Lasso regression with diagnostic analysis.

    R

  6. bikerental-poisson bikerental-poisson Public

    Daily bike rental demand forecasting using Poisson and Negative Binomial (NB2) regression with overdispersion diagnostics and rolling-origin cross-validation.

    R