Skip to content
View emanuel-luis's full-sized avatar
🇧🇷
Focusing 🐍🐍
🇧🇷
Focusing 🐍🐍

Highlights

  • Pro

Block or report emanuel-luis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
emanuel-luis/README.md

Emanuel Luis

Data Engineer · ingestion · dbt modeling · orchestration · data platforms

LinkedIn Stack Overflow Email


About

I design and build scalable data pipelines and modern data architectures that enable organizations to extract strategic value from data. 3+ years in data engineering on a strong software-engineering foundation — specializing in ETL/ELT, data lakes, and analytics platforms.

Data Engineer at PULSE, the IT department of Grupo Mateus (São Luís, Brazil); B.S. in Software Engineering from UNDB.

Core expertise

  • Data pipeline engineering — ETL/ELT design, ingestion, and transformation at scale
  • Data architecture — data lakes, data warehouses, the modern data stack
  • Cloud — Azure (Data Factory, Data Lake, Blob Storage) · Databricks
  • Stack — Python, PySpark, SQL, dbt, Airflow
  • Data modeling — analytics engineering, dimensional modeling

Interests: data platform engineering, modern data stack, distributed data processing, data governance.

Experience

Data Engineer — PULSE (IT department @ Grupo Mateus) · Aug 2022 – present

  • Design and maintain scalable data pipelines processing millions of records daily across 20+ source systems (700+ ingestion configs, ~90 dbt DAGs).
  • Built a data lakehouse on Azure + Databricks — medallion architecture (staging → bronze → gold) with Delta Lake and Unity Catalog.
  • Built a materialized change-tracking (CDC) system that cut a key pipeline's runtime from ~60 min to 13 min.
  • Engineered batch / incremental / CDC ingestion (MSSQL, PostgreSQL) standardized via config-driven Airflow DAG generators and a shared PySpark extraction library (fullscan, incremental, CDC, Autoloader).
  • Built curated dbt layers (bronze → staging → intermediate → marts) orchestrated by Astronomer Cosmos (dbt-in-Airflow).
  • Tuned Spark/PySpark on Databricks (Liquid Clustering, broadcast joins), wrote Protobuf messages to Confluent Kafka, and ran Reverse ETL via Apache NiFi.
  • Built metadata-driven DAG generation system (Jinja2 templates + JSON/YAML configs → auto-rendered Airflow DAGs).
  • Authored pylakeutils — a shared Python library for extraction/loading patterns (RDBMS, CDC, Autoloader), deployed as wheels to Databricks.
  • Ran containerized workloads on Azure Container Instances (ACI) with Docker images hosted on Azure Container Registry (ACR).

Tech

Languages & processing Python SQL PySpark Pandas Polars NumPy PyArrow
Data engineering & orchestration dbt Airflow Cosmos Kafka Delta Lake DLT Protobuf NiFi
Platforms & cloud Databricks Azure GCP AWS Confluent Unity Catalog
Data stores PostgreSQL SQL Server MongoDB Trino Cassandra ADLS Gen2
DevOps & tooling Docker Terraform Git GitHub Actions Azure Pipelines Linux Bash
Frameworks & libraries FastAPI SQLAlchemy Pydantic Jinja2 Selenium pytest OpenCV

Featured projects

Project What it shows
controladoria-api Data/document platform — FastAPI (clean architecture), S3, Alembic migrations, Terraform, scheduled batch sync, Prometheus metrics.
vaccine-backend Dimensional modeling — star schema (dimension/fact), SQL DDL, and seed generation.
bank-credit Full-stack data app — FastAPI + React, graph-based routing and SLAs, migrations and tests.
ecommerce-scrapping Data ingestion — Scrapy + Splash + Selenium with JavaScript rendering.

Open-source contributions

Astronomer Cosmos — the dbt + Airflow integration library (1.6k+ stars). Two merged PRs:

PR What it fixed Impact
#2201template_fields in DbtConsumerWatcherSensor Watcher-mode sensors crashed on fallback to local execution because templated fields from DbtRunLocalOperator were missing. Added the parent's template_fields to the sensor, with tests. Unblocked users running Cosmos Watcher mode with fallback — any Cosmos user hitting #2193.
#2241 — Restore plain-text output in ExecutionMode.WATCHER JSON log output introduced in a prior release made Airflow task logs unreadable in Watcher mode. Refactored FullOutputSubprocessHook to delegate logging through a process_log_line callable; made JSON format conditional on SUBPROCESS invocation mode only. Restored readable dbt logs (with CLI colors) for Watcher-mode users while preserving real-time status tracking for Subprocess mode. 5 files, clean refactor reviewed and approved by Astronomer maintainers.

Certifications

  • Databricks Academy Accreditation — Databricks Fundamentals · Databricks · May 2026
  • Computer Vision Training · Data Inception · May 2022
  • Machine Learning Training · Data Inception · Mar 2022

Publications

  • Computational methodologies for detection & diagnosis of SARS-CoV-2 (COVID-19) from medical images — ERCEMAPI (Regional Computing School, CE/MA/PI). · code: PROJETO-DE-PESQUISA-COVID-19

GitHub Analytics

Education & training
  • B.S. Software Engineering — UNDB (Unidade de Ensino Superior Dom Bosco), São Luís, Brazil · 2021–2025
  • Python and C# coursework — Udemy

(Professional certifications listed in the Certifications section above.)

Popular repositories Loading

  1. desafio-matematica desafio-matematica Public archive

    Pythagorean triplet finder — quadratic functions applied to number theory (Python).

    Python

  2. emanuel-luis emanuel-luis Public

    Config files for my GitHub profile.

  3. books-scrapping books-scrapping Public

    Web-scraping pipeline (Selenium + BeautifulSoup) persisting to a relational store via SQLAlchemy.

    Python

  4. exercises exercises Public archive

    Coding exercise scraper (Exercism, Beecrowd) — Selenium + Python.

    Python

  5. ecommerce-scrapping ecommerce-scrapping Public

    E-commerce data ingestion with Scrapy + scrapy-splash + Selenium (JavaScript rendering).

    Python

  6. capacitacao_visao-computacional capacitacao_visao-computacional Public archive

    Computer-vision training exercises (OpenCV/Python).

    Python