Skip to content
View kaihungtran's full-sized avatar
🫠
focusmaxxing
🫠
focusmaxxing
  • SOPHiA Genetics
  • Boston, MA
  • 21:02 (UTC -04:00)
  • LinkedIn in/thekaitran

Block or report kaihungtran

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kaihungtran/README.md

Kai Tran

From Saigon. Training machines in Boston.

MS in Business Analytics (Data & Methods) @ Boston University Questrom · BS in Finance & Business Analytics + CS Minor · First-gen student


Right now

AI Research Assistant @ BU — Benchmarking LLM guardrail security across 5 open-weight models. Building a 3-layer defense pipeline tested against 30 adversarial attack scenarios for interview avatar systems.

Sales Enablement Intern @ SOPHiA GENETICS — Automating CRM pipeline management and building data visualizations for a genomics company that analyzes tumor data for 1,000+ hospitals worldwide.


Recent ML Projects

Childcare Market Segmentation · UMAP KMeans PCA

Clustered 2,500+ U.S. counties into distinct childcare markets. Silhouette score 0.24 → 0.47 — UMAP cracked what KMeans and Ward's couldn't. Built for policymakers targeting subsidy allocation.

Predicting 30-Day Hospital Readmission · ClinicalBERT Keras PySpark GCP

Catches 70% of patients who'd be readmitted within 30 days. Multimodal neural network trained on 50K+ ICU admissions from MIMIC-IV. Tested 13 model variants across 3 encoders. Best AUC: 0.703. Built to save hospitals from the $17B annual cost of avoidable readmissions.

H-1B Visa Approval Predictor · Gradient Boosting SMOTENC ElasticNet

2.23M+ immigration records. Gradient Boosting AUC 0.949, 87.8% accuracy. Feature-engineered STEM classifications, wage bands, and employer dependency flags from raw DOL data.

Multi-Label ECG Classification · PyTorch TransformerECG ResNet1D

Benchmarked 5 deep learning architectures (CNN, Transformer, GNN, Wavelet) on 27,765 twelve-lead ECGs across 5 cardiac conditions. Best macro-AUC: 0.885 (TransformerECG). Published in NEJM Statistics in Data Science style.


Finance & Business

Finance Projects — DCF models (ATKR, TSLA), equity pitch decks (ATKR, SNCY), commercial airlines industry analysis, cross-functional strategy project, and guerrilla marketing pitch (DedCool).

Power BI Dashboard — Grocery retail analytics dashboard built in Power BI with sales, ratings, and target tracking.


Internships

Boston Scientific · Health Economics & Market Access Intern Insights on elective procedure volume and reimbursement trends → delivered to the Urology VP → presented at board level. Built cost models from Medicare claims data for devices used in 30,000+ annual procedures.

VinaCapital ($4B AUM) · Macro & Fund Analytics Intern Built a customer profitability model using regression in Python. Created the Chief Economic Officer's Q2 webinar deck on Vietnam's GDP and currency performance using Bloomberg + CEIC data.

Letters of Recommendation — Boston Scientific, VinaCapital, and faculty recommendations.


Stack

ML & Deep Learning · Python · TensorFlow/Keras · PyTorch · scikit-learn · PySpark · HuggingFace Data & Cloud · SQL · GCP · Bloomberg Terminal · CEIC Data Visualization & BI · Tableau · Power BI · Salesforce Dev Tools · Git · Claude Code · Java


Off the clock

📸 Photographer · 🍜 Eater of everything · 🐔 Raised chickens, parrots, hamsters, dogs & cats growing up — retirement plan is a farm and I'm not kidding


Full Portfolio · LinkedIn · kaihungtran@outlook.com

Open to full-time roles in data analytics, data science, and business intelligence · Graduating Jan 2027 · OPT + STEM extension eligible

Pinned Loading

  1. childcare-market-segmentation childcare-market-segmentation Public

    Silhouette 0.24→0.47: UMAP-based segmentation of U.S. county childcare markets on 2,500+ counties

    Jupyter Notebook

  2. ecg-classification ecg-classification Public

    TransformerECG macro-AUC 0.885: benchmarking 5 deep learning architectures for multi-label ECG diagnosis on PTB-XL

    Jupyter Notebook

  3. h1b-visa-prediction h1b-visa-prediction Public

    Gradient Boosting AUC 0.949: predicting H-1B visa approval on 2.23M+ applications with SMOTENC and ElasticNet feature selection

    Jupyter Notebook

  4. hospital-readmission-prediction hospital-readmission-prediction Public

    Catches 70% of patients who'd be readmitted within 30 days. ClinicalBERT + multimodal NN on 50K+ MIMIC-IV ICU admissions. AUC: 0.703.

    Jupyter Notebook

  5. employee-mental-health employee-mental-health Public

    KNN beats naive rule by +4.93pp predicting employee stress levels — 5 ML models benchmarked across 4 classification trials on remote work dataset

    Jupyter Notebook