From APIs to Warehouses: AI-Assisted Data Ingestion with dlt
dlt (data load tool) is an open-source Python library for building reliable ELT pipelines. It helps you extract data from APIs or databases, normalize it into clean relational tables, and load it into destinations like DuckDB.
This repository now covers two complementary tracks:
workshop: Open Library quickstart and conceptual notebookHomework: NYC Taxi API pipeline, data checks, and question answers
| Path | Purpose |
|---|---|
workshop.md |
Full workshop instructions (Open Library) |
workshop/dlt_Pipeline_Overview.ipynb |
Intro notebook about dlt pipeline concepts |
Homework/README.md |
Homework questions and selected answers |
Homework/scripts/taxi_pipeline.py |
dlt pipeline for taxi API -> DuckDB |
Homework/scripts/explore_api.py |
Quick endpoint inspection helper |
Homework/notebooks/homework_notebook.ipynb |
Notebook analysis for homework |
Homework/notebooks/marimo_test_notebook.py |
Marimo notebook to explore the loaded taxi data |
- Python 3.13+ (project setting in
pyproject.toml) uv(recommended) orpip- Optional: an AI IDE (Cursor, Windsurf, VS Code + Copilot)
python --versionInstall dependencies from the project root:
uv syncOr install only homework dependencies:
pip install -r Homework/requirements.txtThe workshop flow remains available in full detail in workshop.md.
For the conceptual walkthrough, open:
workshop/dlt_Pipeline_Overview.ipynb
Core workshop commands:
pip install "dlt[workspace]"
dlt init dlthub:open_library duckdb
python open_library_pipeline.py
dlt pipeline open_library_pipeline showRun the homework pipeline from the Homework folder:
cd Homework
python scripts/explore_api.py
python scripts/taxi_pipeline.pyInspect metadata and loaded data:
dlt pipeline taxi_pipeline show
dlt pipeline taxi_pipeline query "SELECT COUNT(*) AS nb_rows FROM taxi_dataset.taxi_trip;"Then use Homework/README.md and the notebook artifacts to validate answers:
Homework/notebooks/homework_notebook.ipynbHomework/ressources/*.png
Open the marimo notebook for interactive SQL exploration:
cd Homework/notebooks
marimo edit marimo_test_notebook.pyRun it in app mode:
cd Homework/notebooks
marimo run marimo_test_notebook.py| Resource | Link |
|---|---|
| dlt Documentation | dlthub.com/docs |
| Open Library Workspace Guide | dlthub.com/workspace/source/open-library |
| dlt Dashboard Docs | dlthub.com/docs/general-usage/dashboard |
| marimo + dlt Guide | dlthub.com/docs/general-usage/dataset-access/marimo |
| Open Library API | openlibrary.org/developers/api |
| Homework Instructions (repo) | Homework/README.md |
| Workshop Instructions (repo) | workshop.md |
Workshop and homework materials based on dltHub
