Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,14 @@ from scratch. Familiarity with numpy is not required, but would be beneficial.
If you plan to do Python programming in a Linux or HPC environment you should
be familiar with these as well.

For following along hands-on, you need
* laptop or desktop with internet access.
* a system set up so you can connect to an HPC system, an account on an HPC
Comment on lines +57 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Add an article to make the bullet grammatically complete.

For example, update this bullet to "* a laptop or desktop with internet access." so it matches the others that start with "a".

Suggested change
For following along hands-on, you need
* laptop or desktop with internet access.
* a system set up so you can connect to an HPC system, an account on an HPC
For following along hands-on, you need
* a laptop or desktop with internet access.
* a system set up so you can connect to an HPC system, an account on an HPC

system (e.g., VSC, CECI, ...), compute credits if that is required to run
jobs on the HPC system if you want to use an HPC system;
Comment on lines +59 to +61
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Improve readability of the long HPC-related bullet point.

The list in this bullet reads a bit awkwardly, especially around "system (e.g., VSC, CECI, ...), compute credits". Consider adding an "and" before "compute credits" or otherwise restructuring to make the list of requirements clearer.

Suggested change
* a system set up so you can connect to an HPC system, an account on an HPC
system (e.g., VSC, CECI, ...), compute credits if that is required to run
jobs on the HPC system if you want to use an HPC system;
* a system set up so you can connect to an HPC system, an account on an HPC
system (e.g., VSC, CECI, ...) and compute credits if required to run
jobs on the HPC system;

* a Python environment that can run Jupyter Lab if you want to use your own system;
* access to Google Colaboratory if you prefer not to install software.


## Level

Expand Down
3 changes: 3 additions & 0 deletions source-code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,16 @@ to create it. There is some material not covered in the presentation as well.
representation and algorithms.
* [`pandas`](pandas): illustrations of using pandas and seaborn.
* [`polars`](polars): Kllustrations of using polars.
* [`duckdb`](duckdb): illustrations of using DuckDB for SQL queries.
* [`regexes`](regexes): illustrations of using regular expressions for
validation and information extraction from textual data.
* [`seaborn`](seaborn): illustrations of using Seaborn to create plots.
* [`web-scraping`](web-scraping): illustration of web scraping using beautiful
soup and graph representation using networkx.
* [`xarray`](xarray): illustrates the xarray library for pandas-like operations
on multi-dimensional arrays.
* [`duckdb`](duckdb): illustrates the DuckDB library for SQL-like operations
on dataframes, including integration with pandas and polars.

**Note:** material on dashboards has been moved to a [dedicated
repository](https://github.com/gjbex/Python-dashboards).
14 changes: 14 additions & 0 deletions source-code/duckdb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# DuckDB

DuckDB is an in-process SQL OLAP database management system. It is designed to
support analytical query workloads and is optimized for fast query performance
on large datasets. DuckDB can be embedded directly into applications, making it
a popular choice for data analysis tasks in various programming environments.


## What is it?

1. `patients.ipynb`: A Jupyter notebook that demonstrates how to use DuckDB for
analyzing patient data. It includes examples of loading data and executing
SQL queries.
1. `data/`: CSV files to use with the notebook.
63 changes: 63 additions & 0 deletions source-code/duckdb/data/patient_experiment.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
,patient,dose,date,temperature
0,1,0.0,2012-10-02 10:00:00,38.3
1,1,2.0,2012-10-02 11:00:00,38.5
2,1,2.0,2012-10-02 12:00:00,38.1
3,1,2.0,2012-10-02 13:00:00,37.3
4,1,0.0,2012-10-02 14:00:00,37.5
5,1,0.0,2012-10-02 15:00:00,37.1
6,1,0.0,2012-10-02 16:00:00,36.8
7,2,0.0,2012-10-02 10:00:00,39.3
8,2,5.0,2012-10-02 11:00:00,39.4
9,2,5.0,2012-10-02 12:00:00,38.1
10,2,5.0,2012-10-02 13:00:00,37.3
11,2,0.0,2012-10-02 14:00:00,36.8
12,2,0.0,2012-10-02 15:00:00,36.8
13,2,0.0,2012-10-02 16:00:00,36.8
14,3,0.0,2012-10-02 10:00:00,37.9
15,3,2.0,2012-10-02 11:00:00,39.5
16,3,5.0,2012-10-02 12:00:00,38.3
17,3,2.0,2012-10-02 13:00:00,
18,3,2.0,2012-10-02 14:00:00,37.7
19,3,2.0,2012-10-02 15:00:00,37.1
20,3,0.0,2012-10-02 16:00:00,36.7
21,4,0.0,2012-10-02 10:00:00,38.1
22,4,5.0,2012-10-02 11:00:00,37.2
23,4,5.0,2012-10-02 12:00:00,36.1
24,4,0.0,2012-10-02 13:00:00,35.9
25,4,,2012-10-02 14:00:00,36.3
26,4,0.0,2012-10-02 15:00:00,36.6
27,4,0.0,2012-10-02 16:00:00,36.7
28,5,0.0,2012-10-02 10:00:00,37.9
29,5,3.0,2012-10-02 11:00:00,39.5
30,5,7.0,2012-10-02 12:00:00,38.3
31,5,5.0,2012-10-02 13:00:00,38.5
32,5,9.0,2012-10-02 14:00:00,39.4
33,5,3.0,2012-10-02 15:00:00,37.9
34,5,0.0,2012-10-02 16:00:00,37.2
35,6,0.0,2012-10-02 10:00:00,37.5
36,6,2.0,2012-10-02 11:00:00,38.1
37,6,3.0,2012-10-02 12:00:00,37.9
38,6,2.0,2012-10-02 13:00:00,37.7
39,6,1.0,2012-10-02 14:00:00,37.2
40,6,0.0,2012-10-02 15:00:00,36.8
41,7,0.0,2012-10-02 10:00:00,39.5
42,7,10.0,2012-10-02 11:00:00,40.7
43,7,5.0,2012-10-02 12:00:00,39.8
44,7,8.0,2012-10-02 13:00:00,40.2
45,7,3.0,2012-10-02 14:00:00,38.3
46,7,3.0,2012-10-02 15:00:00,37.6
47,7,1.0,2012-10-02 16:00:00,37.3
48,8,0.0,2012-10-02 10:00:00,37.8
49,8,0.0,2012-10-02 11:00:00,37.9
50,8,0.0,2012-10-02 12:00:00,37.4
51,8,0.0,2012-10-02 13:00:00,37.6
52,8,0.0,2012-10-02 14:00:00,37.3
53,8,0.0,2012-10-02 15:00:00,37.1
54,8,0.0,2012-10-02 16:00:00,36.8
55,9,0.0,2012-10-02 10:00:00,38.3
56,9,10.0,2012-10-02 11:00:00,39.5
57,9,12.0,2012-10-02 12:00:00,40.2
58,9,4.0,2012-10-02 13:00:00,39.1
59,9,4.0,2012-10-02 14:00:00,37.9
60,9,0.0,2012-10-02 15:00:00,37.1
61,9,0.0,2012-10-02 16:00:00,37.3
11 changes: 11 additions & 0 deletions source-code/duckdb/data/patient_metadata.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
,patient,gender,condition
0,1,M,A
1,2,F,A
2,3,M,A
3,5,M,A
4,6,F,B
5,7,M,B
6,8,F,B
7,9,M,B
8,10,F,B
9,11,M,B
Loading