gjbex · gjbex · Nov 25, 2025 · Jul 16, 2025 · Aug 14, 2025 · Oct 15, 2025
diff --git a/docs/README.md b/docs/README.md
@@ -54,6 +54,14 @@ from scratch.  Familiarity with numpy is not required, but would be beneficial.
 If you plan to do Python programming in a Linux or HPC environment you should
 be familiar with these as well.
 
+For following along hands-on, you need
+* laptop or desktop with internet access.
+* a system set up so you can connect to an HPC system, an account on an HPC
-For following along hands-on, you need
-* laptop or desktop with internet access.
-* a system set up so you can connect to an HPC system, an account on an HPC
+For following along hands-on, you need
+* a laptop or desktop with internet access.
+* a system set up so you can connect to an HPC system, an account on an HPC
-For following along hands-on, you need
-* laptop or desktop with internet access.
-* a system set up so you can connect to an HPC system, an account on an HPC
+For following along hands-on, you need
+* a laptop or desktop with internet access.
+* a system set up so you can connect to an HPC system, an account on an HPC
+  system (e.g., VSC, CECI, ...), compute credits if that is required to run
+  jobs on the HPC system if you want to use an HPC system;
-* a system set up so you can connect to an HPC system, an account on an HPC
-  system (e.g., VSC, CECI, ...), compute credits if that is required to run
-  jobs on the HPC system if you want to use an HPC system;
+* a system set up so you can connect to an HPC system, an account on an HPC
+  system (e.g., VSC, CECI, ...) and compute credits if required to run
+  jobs on the HPC system;
-* a system set up so you can connect to an HPC system, an account on an HPC
-  system (e.g., VSC, CECI, ...), compute credits if that is required to run
-  jobs on the HPC system if you want to use an HPC system;
+* a system set up so you can connect to an HPC system, an account on an HPC
+  system (e.g., VSC, CECI, ...) and compute credits if required to run
+  jobs on the HPC system;
+* a Python environment that can run Jupyter Lab if you want to use your own system;
+* access to Google Colaboratory if you prefer not to install software.
+
 
 ## Level
 

diff --git a/source-code/README.md b/source-code/README.md
@@ -15,13 +15,16 @@ to create it.  There is some material not covered in the presentation as well.
   representation and algorithms.
 * [`pandas`](pandas): illustrations of using pandas and seaborn.
 * [`polars`](polars): Kllustrations of using polars.
+* [`duckdb`](duckdb): illustrations of using DuckDB for SQL queries.
 * [`regexes`](regexes): illustrations of using regular expressions for
   validation and information extraction from textual data.
 * [`seaborn`](seaborn): illustrations of using Seaborn to create plots.
 * [`web-scraping`](web-scraping): illustration of web scraping using beautiful
   soup and graph representation using networkx.
 * [`xarray`](xarray): illustrates the xarray library for pandas-like operations
   on multi-dimensional arrays.
+* [`duckdb`](duckdb): illustrates the DuckDB library for SQL-like operations
+  on dataframes, including integration with pandas and polars.
 
 **Note:** material on dashboards has been moved to a [dedicated
 repository](https://github.com/gjbex/Python-dashboards).
diff --git a/source-code/duckdb/README.md b/source-code/duckdb/README.md
@@ -0,0 +1,14 @@
+# DuckDB
+
+DuckDB is an in-process SQL OLAP database management system. It is designed to
+support analytical query workloads and is optimized for fast query performance
+on large datasets. DuckDB can be embedded directly into applications, making it
+a popular choice for data analysis tasks in various programming environments.
+
+
+## What is it?
+
+1. `patients.ipynb`: A Jupyter notebook that demonstrates how to use DuckDB for
+   analyzing patient data. It includes examples of loading data and executing
+   SQL queries.
+1. `data/`: CSV files to use with the notebook.
diff --git a/source-code/duckdb/data/patient_experiment.csv b/source-code/duckdb/data/patient_experiment.csv
@@ -0,0 +1,63 @@
+,patient,dose,date,temperature
+0,1,0.0,2012-10-02 10:00:00,38.3
+1,1,2.0,2012-10-02 11:00:00,38.5
+2,1,2.0,2012-10-02 12:00:00,38.1
+3,1,2.0,2012-10-02 13:00:00,37.3
+4,1,0.0,2012-10-02 14:00:00,37.5
+5,1,0.0,2012-10-02 15:00:00,37.1
+6,1,0.0,2012-10-02 16:00:00,36.8
+7,2,0.0,2012-10-02 10:00:00,39.3
+8,2,5.0,2012-10-02 11:00:00,39.4
+9,2,5.0,2012-10-02 12:00:00,38.1
+10,2,5.0,2012-10-02 13:00:00,37.3
+11,2,0.0,2012-10-02 14:00:00,36.8
+12,2,0.0,2012-10-02 15:00:00,36.8
+13,2,0.0,2012-10-02 16:00:00,36.8
+14,3,0.0,2012-10-02 10:00:00,37.9
+15,3,2.0,2012-10-02 11:00:00,39.5
+16,3,5.0,2012-10-02 12:00:00,38.3
+17,3,2.0,2012-10-02 13:00:00,
+18,3,2.0,2012-10-02 14:00:00,37.7
+19,3,2.0,2012-10-02 15:00:00,37.1
+20,3,0.0,2012-10-02 16:00:00,36.7
+21,4,0.0,2012-10-02 10:00:00,38.1
+22,4,5.0,2012-10-02 11:00:00,37.2
+23,4,5.0,2012-10-02 12:00:00,36.1
+24,4,0.0,2012-10-02 13:00:00,35.9
+25,4,,2012-10-02 14:00:00,36.3
+26,4,0.0,2012-10-02 15:00:00,36.6
+27,4,0.0,2012-10-02 16:00:00,36.7
+28,5,0.0,2012-10-02 10:00:00,37.9
+29,5,3.0,2012-10-02 11:00:00,39.5
+30,5,7.0,2012-10-02 12:00:00,38.3
+31,5,5.0,2012-10-02 13:00:00,38.5
+32,5,9.0,2012-10-02 14:00:00,39.4
+33,5,3.0,2012-10-02 15:00:00,37.9
+34,5,0.0,2012-10-02 16:00:00,37.2
+35,6,0.0,2012-10-02 10:00:00,37.5
+36,6,2.0,2012-10-02 11:00:00,38.1
+37,6,3.0,2012-10-02 12:00:00,37.9
+38,6,2.0,2012-10-02 13:00:00,37.7
+39,6,1.0,2012-10-02 14:00:00,37.2
+40,6,0.0,2012-10-02 15:00:00,36.8
+41,7,0.0,2012-10-02 10:00:00,39.5
+42,7,10.0,2012-10-02 11:00:00,40.7
+43,7,5.0,2012-10-02 12:00:00,39.8
+44,7,8.0,2012-10-02 13:00:00,40.2
+45,7,3.0,2012-10-02 14:00:00,38.3
+46,7,3.0,2012-10-02 15:00:00,37.6
+47,7,1.0,2012-10-02 16:00:00,37.3
+48,8,0.0,2012-10-02 10:00:00,37.8
+49,8,0.0,2012-10-02 11:00:00,37.9
+50,8,0.0,2012-10-02 12:00:00,37.4
+51,8,0.0,2012-10-02 13:00:00,37.6
+52,8,0.0,2012-10-02 14:00:00,37.3
+53,8,0.0,2012-10-02 15:00:00,37.1
+54,8,0.0,2012-10-02 16:00:00,36.8
+55,9,0.0,2012-10-02 10:00:00,38.3
+56,9,10.0,2012-10-02 11:00:00,39.5
+57,9,12.0,2012-10-02 12:00:00,40.2
+58,9,4.0,2012-10-02 13:00:00,39.1
+59,9,4.0,2012-10-02 14:00:00,37.9
+60,9,0.0,2012-10-02 15:00:00,37.1
+61,9,0.0,2012-10-02 16:00:00,37.3
diff --git a/source-code/duckdb/data/patient_metadata.csv b/source-code/duckdb/data/patient_metadata.csv
@@ -0,0 +1,11 @@
+,patient,gender,condition
+0,1,M,A
+1,2,F,A
+2,3,M,A
+3,5,M,A
+4,6,F,B
+5,7,M,B
+6,8,F,B
+7,9,M,B
+8,10,F,B
+9,11,M,B