MS Analytics (Computational Data Track) — Georgia Tech · BS Cognitive Science (ML & Neurocomputational Focus) — UCSD
Data engineering + ML for longitudinal/biomedical data. I build reproducible pipelines, research-grade analyses, and clean model evaluation.
View repos · LinkedIn · Google Scholar
- Wearable data pipeline — Postgres ingestion → dbt transforms/tests → CI via GitHub Actions
→ https://github.com/mrkchoe/wearable-data-pipeline - Model comparison / evaluation — reproducible baselines + metrics + reporting
→ https://github.com/mrkchoe/ad-mri-pet-model-comparison - Sanctuary operations data platform — operational data modeling and SQL-based reporting for intake, care, adoption, and cost analytics in a sanctuary environment
→ https://github.com/mrkchoe/sanctuary-operations-data-platform - Airflow API → PostgreSQL ELT pipeline — Dockerized Apache Airflow DAG orchestrating API ingestion, structured transformation, PostgreSQL loading, and basic data validation in a reproducible local environment
→ https://github.com/mrkchoe/airflow-api-to-postgres-demo - Neuroimaging publications — curated bibliographic list of peer-reviewed articles with structured authorship context supporting longitudinal MRI/biomarker research
→ https://github.com/mrkchoe/publications
Languages
Python · SQL
Data Engineering & Infrastructure
PostgreSQL · Apache Airflow · dbt (models, tests, documentation)
Docker · GitHub Actions
Relational schema design · Data validation · Reproducible batch pipelines
Machine Learning & Evaluation
scikit-learn · Feature preprocessing
Cross-validation frameworks · Model evaluation diagnostics (ROC, PR, confusion matrices)
Structured model comparison & research-grade reporting
Visualization
D3.js (interactive statistical visualization)
Seaborn (statistical plots & model diagnostics)
Java / R / Bash
AWS (EC2, S3, SageMaker)
TensorFlow / Pytorch / Neural network modeling
Longitudinal biomedical dataset management