DuckLake Bootstrap (DuckDB metadata + MinIO data)

This starter helps you spin up a DuckLake catalog backed by a local DuckDB metadata DB and MinIO (S3-compatible) for Parquet data files. It also includes a CLI to generate and load a TPC-H dataset into DuckLake using DuckDB's ducklake and tpch extensions (no manual data loads).

What's inside

docker-compose.minio.yml — launches a local MinIO server + console.
config.example.yaml — minimal config for metadata path, MinIO, and dataset options.
bootstrap_ducklake.py — Python CLI to attach a DuckLake catalog and load TPC-H tables.
This README.md.

Quick start

# One-time env setup (install dependencies, source environment, etc)
./setup_env.sh

# Automate calling main script: startup ducklake, minio, load tpch dataset into ducklake
./run_ducklake.sh

At this point, you'll have a working ducklake! If you want, you can validate the tpch dataset with the standard tpch queries:

./run_tpch_queries.py

More detailed start

Run MinIO (Docker required)

docker compose -f docker-compose.minio.yml up -d
# Console: http://localhost:9001  |  S3 endpoint: http://localhost:9000
# Default creds in compose file: minioadmin / minioadmin
# Create a bucket, e.g. 'ducklake-data' (via the console)

Copy config & edit

cp config.example.yaml config.yaml
# set: bucket, prefix, region, and (optionally) MINIO creds via env or config

Use the CLI

python3 bootstrap_ducklake.py attach --config config.yaml
python3 bootstrap_ducklake.py load-tpch --config config.yaml --scale 1
# Re-run load-tpch with a different scale if you want (will CTAS into DuckLake)

Design notes

Extensible backends: The CLI is structured so you can add new metadata backends (e.g., Postgres) and storage backends (e.g., AWS S3) later.
Pure DuckDB/DuckLake: We use only DuckDB SQL: CREATE SECRET ..., ATTACH 'ducklake:...' (DATA_PATH ...), and CALL dbgen(...) followed by CREATE TABLE ... AS SELECT ... into DuckLake.
No manual file writing: Parquet files are created by DuckLake within your object store path.

Requirements

Python 3.9+
pip install duckdb pyyaml
Docker (for running MinIO).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DuckLake Bootstrap (DuckDB metadata + MinIO data)

What's inside

Quick start

More detailed start

Design notes

Requirements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
bootstrap_ducklake.py		bootstrap_ducklake.py
config.example.yaml		config.example.yaml
docker-compose.minio.yml		docker-compose.minio.yml
requirements.txt		requirements.txt
run_ducklake.sh		run_ducklake.sh
run_tpch_queries.py		run_tpch_queries.py
setup_env.sh		setup_env.sh

hotdata-dev/ducklake-bootstrap

Folders and files

Latest commit

History

Repository files navigation

DuckLake Bootstrap (DuckDB metadata + MinIO data)

What's inside

Quick start

More detailed start

Design notes

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages