Data Engineer · ingestion · dbt modeling · orchestration · data platforms
I design and build scalable data pipelines and modern data architectures that enable organizations to extract strategic value from data. 3+ years in data engineering on a strong software-engineering foundation — specializing in ETL/ELT, data lakes, and analytics platforms.
Data Engineer at PULSE, the IT department of Grupo Mateus (São Luís, Brazil); B.S. in Software Engineering from UNDB.
Core expertise
- Data pipeline engineering — ETL/ELT design, ingestion, and transformation at scale
- Data architecture — data lakes, data warehouses, the modern data stack
- Cloud — Azure (Data Factory, Data Lake, Blob Storage) · Databricks
- Stack — Python, PySpark, SQL, dbt, Airflow
- Data modeling — analytics engineering, dimensional modeling
Interests: data platform engineering, modern data stack, distributed data processing, data governance.
Data Engineer — PULSE (IT department @ Grupo Mateus) · Aug 2022 – present
- Design and maintain scalable data pipelines processing millions of records daily across 20+ source systems (700+ ingestion configs, ~90 dbt DAGs).
- Built a data lakehouse on Azure + Databricks — medallion architecture (staging → bronze → gold) with Delta Lake and Unity Catalog.
- Built a materialized change-tracking (CDC) system that cut a key pipeline's runtime from ~60 min to 13 min.
- Engineered batch / incremental / CDC ingestion (MSSQL, PostgreSQL) standardized via config-driven Airflow DAG generators and a shared PySpark extraction library (fullscan, incremental, CDC, Autoloader).
- Built curated dbt layers (bronze → staging → intermediate → marts) orchestrated by Astronomer Cosmos (dbt-in-Airflow).
- Tuned Spark/PySpark on Databricks (Liquid Clustering, broadcast joins), wrote Protobuf messages to Confluent Kafka, and ran Reverse ETL via Apache NiFi.
- Built metadata-driven DAG generation system (Jinja2 templates + JSON/YAML configs → auto-rendered Airflow DAGs).
- Authored
pylakeutils— a shared Python library for extraction/loading patterns (RDBMS, CDC, Autoloader), deployed as wheels to Databricks. - Ran containerized workloads on Azure Container Instances (ACI) with Docker images hosted on Azure Container Registry (ACR).
| Languages & processing |
|
| Data engineering & orchestration |
|
| Platforms & cloud |
|
| Data stores |
|
| DevOps & tooling |
|
| Frameworks & libraries |
|
| Project | What it shows |
|---|---|
controladoria-api |
Data/document platform — FastAPI (clean architecture), S3, Alembic migrations, Terraform, scheduled batch sync, Prometheus metrics. |
vaccine-backend |
Dimensional modeling — star schema (dimension/fact), SQL DDL, and seed generation. |
bank-credit |
Full-stack data app — FastAPI + React, graph-based routing and SLAs, migrations and tests. |
ecommerce-scrapping |
Data ingestion — Scrapy + Splash + Selenium with JavaScript rendering. |
Astronomer Cosmos — the dbt + Airflow integration library (1.6k+ stars). Two merged PRs:
| PR | What it fixed | Impact |
|---|---|---|
#2201 — template_fields in DbtConsumerWatcherSensor |
Watcher-mode sensors crashed on fallback to local execution because templated fields from DbtRunLocalOperator were missing. Added the parent's template_fields to the sensor, with tests. |
Unblocked users running Cosmos Watcher mode with fallback — any Cosmos user hitting #2193. |
#2241 — Restore plain-text output in ExecutionMode.WATCHER |
JSON log output introduced in a prior release made Airflow task logs unreadable in Watcher mode. Refactored FullOutputSubprocessHook to delegate logging through a process_log_line callable; made JSON format conditional on SUBPROCESS invocation mode only. |
Restored readable dbt logs (with CLI colors) for Watcher-mode users while preserving real-time status tracking for Subprocess mode. 5 files, clean refactor reviewed and approved by Astronomer maintainers. |
- Databricks Academy Accreditation — Databricks Fundamentals · Databricks · May 2026
- Computer Vision Training · Data Inception · May 2022
- Machine Learning Training · Data Inception · Mar 2022
- Computational methodologies for detection & diagnosis of SARS-CoV-2 (COVID-19) from medical images — ERCEMAPI (Regional Computing School, CE/MA/PI). · code:
PROJETO-DE-PESQUISA-COVID-19
Education & training
- B.S. Software Engineering — UNDB (Unidade de Ensino Superior Dom Bosco), São Luís, Brazil · 2021–2025
- Python and C# coursework — Udemy
(Professional certifications listed in the Certifications section above.)

