Hands-on SQL practice using DuckDB to explore and analyze the NYC Taxi dataset. This repository contains materials for the SQL AMA Workshop, starting with DuckDB as our in-process analytical SQL engine and the NYC Taxi Dataset as our primary practice dataset. Learn more about DuckDB from the docs: https://duckdb.org/docs/stable/ Explore the dataset host site: NYC TLC Trip Record Data
- Build a solid foundation in SQL syntax and relational query concepts.
- Understand how DuckDB operates as a lightweight but high-performance analytics engine.
- Gain practical experience querying a real-world dataset without provisioning or managing a persistent database server.
- Learn to integrate DuckDB into an analytics workflow for fast iteration and reproducible results.
To ensure a smooth learning experience, weβll use a local-first, minimal-dependency analytics stack that participants can set up in minutes:
- DuckDB β In-process analytical database, our primary SQL execution engine.
- Python 3.10+ β Host environment for running DuckDB queries programmatically.
- Jupyter Notebooks β Interactive development and explanation of query logic.
- NYC Taxi Dataset β Primary data source with millions of trip records.
- Parquet & CSV β File formats weβll query directly without importing into a DB.
DuckDB is an in-process OLAP (Online Analytical Processing) database designed for fast, interactive analytics directly within your working environment. a columnar, vectorized execution engine optimized for large-scale aggregations and joins. Read more from the official documentation: https://duckdb.org/why_duckdb