20+ years building production-grade data pipelines, lakehouse architectures, and analytics platforms across energy, finance and enterprise domains.
Most of my work lives at the intersection of complex data sources, cloud-native AWS infrastructure, and the unglamorous but critical work of making data actually trustworthy in production.
- End-to-end ETL/ELT pipelines for structured and semi-structured data
- Lakehouse architectures on AWS (S3, Glue, Athena, Lambda, Delta Lake)
- Schema evolution and temporal data models built for production reality
- Data quality and reconciliation frameworks at scale
- Workflow automation with Python, Airflow, and Docker
ercot-plan-ranker — A transparent, runnable pipeline that simulates and ranks Texas electricity plans against realistic usage profiles and weather scenarios. Portfolio-friendly starting point for a production-style lakehouse. Roadmap includes dbt, Airflow, Postgres Bronze/Silver/Gold layers.
xml-drift-lakehouse (work in progress) — A generic open-source toolkit for ingesting, parsing, and modeling XML-based data sources into lakehouse architectures. Designed to be portable across industries.
Python, SQL, AWS (Glue, Athena, Lambda, S3, DynamoDB, Fargate), Delta Lake, dbt, Docker, Airflow, PySpark
Open to senior data engineering opportunities in the Dallas-Fort Worth area.