In [1]:
import duckdb

# Load SQL extension
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

Config,value
feedback,True
autopandas,True
displaylimit,10
displaycon,False


Unnamed: 0,Count
0,224


Most SQL DBs come with a number of handy ways to generate data. This can be very useful for creating an index to aggregate data, whether that's numerical or date-based.

In [2]:
%%sql
SELECT
    r.range
FROM range(0,100) r
LIMIT 5

Unnamed: 0,range
0,0
1,1
2,2
3,3
4,4


We can also increment by different values

In [3]:
%%sql
SELECT
    r.range
FROM range(0,100,2) r
LIMIT 5

Unnamed: 0,range
0,0
1,2
2,4
3,6
4,8


Or generate dates

In [4]:
%%sql
SELECT
    r.range
FROM range(DATE '2019-01-01', DATE '2025-01-01', INTERVAL '1 day') r
LIMIT 5;

Unnamed: 0,range
0,2019-01-01
1,2019-01-02
2,2019-01-03
3,2019-01-04
4,2019-01-05


Why is this useful? Well imagine you'd like to pull in data from multiple sources or generate a running aggregation. We can't always be sure that every date/number is accounted for... Generating a range allows us to _be sure_ every date is covered!

In [5]:
%%sql
WITH date_range AS (
    SELECT
        r.range
    FROM range(DATE '2024-02-01', DATE '2024-02-29', INTERVAL '1 day') r
)
SELECT
    dr.range as dt,
    COUNT(DISTINCT a.title) as num_alerts
FROM date_range dr
LEFT JOIN nps_public_data.alerts a
    ON dr.range::DATE = a.lastindexeddate::DATE
GROUP BY 1
ORDER BY 1
LIMIT 12

Unnamed: 0,dt,num_alerts
0,2024-02-01,2
1,2024-02-02,0
2,2024-02-03,1
3,2024-02-04,0
4,2024-02-05,2
5,2024-02-06,5
6,2024-02-07,0
7,2024-02-08,5
8,2024-02-09,5
9,2024-02-10,0


Note the days with zero alerts— those would have been skipped without our generated range! Note that there are a few aliases for DuckDB [range functions](https://duckdb.org/docs/sql/functions/nested.html#range-functions) and these look different in every variant of SQL... Some lack it entirely!