DuckDB - The SQLite for OLAP

DuckDB is an in-process SQL OLAP database management system

When to use DuckDB

Processing and storing tabular datasets, e.g. from CSV or Parquet files
Interactive data analysis, e.g. Joining & aggregate multiple large tables
Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
Large result set transfer to client

When to not use DuckDB

Online transaction processing (OLTP) vs. Online analytic processing (OLAP)

Property	Transaction processing systems (OLTP)	Analytic systems (OLAP)
Main read pattern	Small number of records per query, fetched by key	Aggregate over large number of records
Main write pattern	Random-access, low-latency writes from user input	Bulk import (ETL) or event stream
Primarily used by	End user/customer, via web application	Internal analyst, for decision support
What data represents	Latest state of data (current point in time)	History of events that happened over time
Dataset size	Gigabytes to terabytes	Terabytes to petabytes

Source: Kleppmann (2017), Designing Data-Intensive Applications, O’Reilly

Generate Kaggle API Token (Login to Kaggle, click on your user symbol in the upper right corner, choose "Account", scroll down to "API" and click "Create New API Token")
Place the content of the file into the file ".kaggle/kaggle.json"
Run: "./download.sh" (you need "jq") See also: https://www.kaggle.com/docs/api

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.kaggle		.kaggle
csv-data		csv-data
parquet-data		parquet-data
.gitignore		.gitignore
README.md		README.md
download.sh		download.sh
duckdb-test.ipynb		duckdb-test.ipynb
requirements.txt		requirements.txt
spark-test.ipynb		spark-test.ipynb