In [63]:
%load_ext sql
%sql postgresql://postgres:postgres@localhost:5433/chinook
%config SqlMagic.displaylimit = 5

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## 1. Joins and Aggregations

### 1.1 Find the top 5 customers by total amount spent. Show their full name (first and last), country, and total amount spent. Order by total spent descending.

In [2]:
%%sql
SELECT
    ANY_VALUE(c.first_name) as first_name,
    ANY_VALUE(c.last_name) as last_name,
    ANY_VALUE(c.country) as country,
    ROUND(SUM(i.total), 2) AS total_sales
FROM customer AS c
JOIN invoice AS i
    ON c.customer_id = i.customer_id
GROUP BY c.customer_id
ORDER BY total_sales DESC
LIMIT 5

first_name,last_name,country,total_sales
Helena,Holý,Czech Republic,49.62
Richard,Cunningham,USA,47.62
Luis,Rojas,Chile,46.62
Hugh,O'Reilly,Ireland,45.62
Ladislav,Kovács,Hungary,45.62


### 1.2 Find all tracks that are longer than the average track length in their genre. Show the track name, genre name, track length (in minutes), and the average length for that genre (in minutes). Order by genre and then by track length descending.

In [3]:
%%sql
WITH average_track_lengths_per_genre AS (
    SELECT
        genre_id,
        AVG(milliseconds) / 60000 AS average_track_length
    FROM track
    GROUP BY genre_id
), track_min AS (
    SELECT
        t.*,
        g.name AS genre_name,
        (t.milliseconds / 60000) AS track_length_min
    FROM track AS t
    JOIN genre AS g
        ON t.genre_id = g.genre_id
)
SELECT
    t.name AS track_name,
    t.genre_name AS genre_name,
    t.track_length_min AS track_length_min,
    atl.average_track_length AS genre_average_track_length_min
FROM track_min AS t
JOIN average_track_lengths_per_genre AS atl
    ON t.genre_id = atl.genre_id
WHERE t.track_length_min > atl.average_track_length
ORDER BY t.track_length_min DESC, t.genre_name DESC

track_name,genre_name,track_length_min,genre_average_track_length_min
Occupation / Precipice,TV Shows,88,35.75068369175627
Through a Looking Glass,Drama,84,42.92139635416667
"Greetings from Earth, Pt. 1",Sci Fi & Fantasy,49,48.52971730769231
"Battlestar Galactica, Pt. 2",Sci Fi & Fantasy,49,48.52971730769231
"Battlestar Galactica, Pt. 1",Sci Fi & Fantasy,49,48.52971730769231


## 2. Window Functions

### 2.1 For each track, show its name, genre name, length in minutes, and how it ranks by length within its genre (longest = rank 1). Also include the length of the previous track in the same genre when ordered by length descending.

The result should have columns:

- track_name
- genre_name
- track_length_min
- rank_in_genre (1 = longest track in that genre)
- previous_track_length_min (length of the next-longest track in same genre, or NULL if it's the longest)

Order by genre name, then by rank.

In [4]:
%%sql
SELECT
    t.name AS track_name,
    g.name AS genre_name,
    RANK() OVER (PARTITION BY t.genre_id ORDER BY t.milliseconds DESC) AS rank_in_genre
FROM track AS t
JOIN genre AS g
    ON t.genre_id = g.genre_id
LIMIT 1

track_name,genre_name,rank_in_genre
Dazed And Confused,Rock,1


### 2.2 For each genre, rank tracks by their length (milliseconds) in descending order (longest = rank 1). Show genre name, track name, milliseconds, and rank within genre.

The result should have columns:

- track_name
- genre_name
- milliseconds
- rank_in_genre

Order by genre_name, then rank ascending.

### 2.3 For each employee, show their full name, title, hire date, and a running total of how many employees had been hired up to and including that date (i.e. a cumulative headcount ordered by hire date).

The result should have columns:

- full_name
- title
- hire_date
- cumulative_hires (number of employees hired up to and including this row's hire date)

Order by hire_date ascending.

In [9]:
%%sql
SELECT
    e.first_name || ' ' || e.last_name AS full_name,
    e.title,
    e.hire_date,
    COUNT(*) OVER (ORDER BY e.hire_date)
FROM employee AS e
ORDER BY e.hire_date ASC

full_name,title,hire_date,count
Jane Peacock,Sales Support Agent,2002-04-01 00:00:00,1
Nancy Edwards,Sales Manager,2002-05-01 00:00:00,2
Andrew Adams,General Manager,2002-08-14 00:00:00,3
Margaret Park,Sales Support Agent,2003-05-03 00:00:00,4
Steve Johnson,Sales Support Agent,2003-10-17 00:00:00,6


### 2.4 For each invoice, show the invoice id, customer's full name, invoice date, invoice total, and the customer's running total spent across all their invoices (ordered by invoice date).

The result should have columns:

- invoice_id
- full_name
- invoice_date
- total
- running_total (cumulative amount spent by that customer up to and including this invoice)

Order by full_name, then invoice_date ascending.

In [11]:
%%sql
SELECT
    i.invoice_id,
    c.first_name || ' ' || c.last_name AS full_name,
    i.invoice_date,
    i.total,
    SUM(i.total) OVER (PARTITION BY i.customer_id ORDER BY invoice_date) AS running_total
FROM customer AS c
JOIN invoice AS i
    ON c.customer_id = i.customer_id
ORDER BY full_name, i.invoice_date ASC

invoice_id,full_name,invoice_date,total,running_total
50,Aaron Mitchell,2021-08-06 00:00:00,1.98,1.98
61,Aaron Mitchell,2021-09-16 00:00:00,13.86,15.84
116,Aaron Mitchell,2022-05-17 00:00:00,8.91,24.75
245,Aaron Mitchell,2023-12-22 00:00:00,1.98,26.73
268,Aaron Mitchell,2024-03-25 00:00:00,3.96,30.69


### 2.5 For each track sale (invoice line), show the track name, genre name, the sale amount, and the difference in sale amount compared to the previous sale of a track in the same genre (ordered by invoice date).

The result should have columns:

- track_name
- genre_name
- invoice_date
- sale_amount
- diff_from_previous (sale_amount minus the previous sale amount in that genre, or NULL if it's the first)

Order by genre_name, invoice_date ascending.

In [18]:
%%sql
WITH temp AS (
    SELECT
        t.name AS track_name,
        g.name AS genre_name,
        i.invoice_date AS invoice_date,
        il.unit_price * il.quantity AS sale_amount
    FROM invoice_line AS il
    JOIN invoice AS i
        ON il.invoice_id = i.invoice_id
    JOIN track AS t
        ON il.track_id = t.track_id
    JOIN genre AS g
        ON t.genre_id = g.genre_id
) SELECT
    track_name,
    genre_name,
    invoice_date,
    sale_amount,
    sale_amount - LAG(sale_amount) OVER (PARTITION BY genre_name ORDER BY invoice_date) AS diff_from_previous
FROM temp
    ORDER BY genre_name, invoice_date ASC

track_name,genre_name,invoice_date,sale_amount,diff_from_previous
Show Me How to Live (Live at the Quart Festival),Alternative,2022-03-21 00:00:00,0.99,
Say Hello 2 Heaven,Alternative,2022-03-21 00:00:00,0.99,0.0
All Night Thing,Alternative,2022-03-21 00:00:00,0.99,0.0
Scar On the Sky,Alternative,2022-03-21 00:00:00,0.99,0.0
Until We Fall,Alternative,2022-03-21 00:00:00,0.99,0.0


### 2.6 For each customer, show their full name, country, total amount spent, and how they rank within their country by total spent (1 = highest spender in that country).

The result should have columns:

- full_name
- country
- total_spent
- rank_in_country

Order by country, then rank_in_country ascending.

In [31]:
%%sql
WITH temp AS(
    SELECT
        c.customer_id,
        c.first_name || ' ' || c.last_name AS full_name,
        c.country,
        SUM(i.total) AS total_spent
    FROM customer AS c
    JOIN invoice AS i
        ON c.customer_id = i.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name, c.country
) SELECT
    full_name,
    country,
    total_spent,
    RANK() OVER (PARTITION BY country ORDER BY total_spent DESC) AS rank_in_country
FROM temp
ORDER BY customer_id, country, rank_in_country ASC

full_name,country,total_spent,rank_in_country
Luís Gonçalves,Brazil,39.62,1
Leonie Köhler,Germany,37.62,2
François Tremblay,Canada,39.62,1
Bjørn Hansen,Norway,39.62,1
František Wichterlová,Czech Republic,40.62,2


### 2.7 For each track, show the track name, genre name, its unit price, the average unit price in its genre, and the difference between the track's price and the genre average (track price - genre average). Round the average and difference to 2 decimal places.

The result should have columns:

- track_name
- genre_name
- unit_price
- avg_genre_price
- price_diff

Order by price_diff descending.

In [83]:
%%sql
SELECT
    t.name AS track_name,
    g.name AS genre_name,
    t.unit_price,
    ROUND(AVG(t.unit_price) OVER (PARTITION BY g.genre_id), 2) AS avg_price_of_genre,
    ROUND(t.unit_price - AVG(t.unit_price) OVER (PARTITION BY g.genre_id), 2) AS price_diff
FROM track AS t
JOIN genre AS g
    ON t.genre_id = g.genre_id
ORDER BY price_diff DESC

track_name,genre_name,unit_price,avg_price_of_genre,price_diff
For Those About To Rock (We Salute You),Rock,0.99,0.99,0.0
Balls to the Wall,Rock,0.99,0.99,0.0
Fast As a Shark,Rock,0.99,0.99,0.0
Restless and Wild,Rock,0.99,0.99,0.0
Princess of the Dawn,Rock,0.99,0.99,0.0


### 2.8 For each customer, show their invoices with the previous and next invoice amounts. Display customer first name, last name, invoice date, invoice total, previous invoice total, and next invoice total for that same customer. Sort by customer and date.

The result should have columns:

- first_name
- last_name
- invoice_date
- current_total
- previous_total (NULL if first invoice)
- next_total (NULL if last invoice)

Order by last_name, first_name, invoice_date ascending.

In [94]:
%%sql
SELECT
    c.first_name,
    c.last_name,
    i.invoice_date,
    i.total AS current_total,
    LAG(i.total) OVER (PARTITION BY c.customer_id ORDER BY i.invoice_date ASC) AS prev_total,
    LEAD(i.total) OVER (PARTITION BY c.customer_id ORDER BY i.invoice_date ASC) AS next_total
FROM customer AS c
JOIN invoice AS i
    ON c.customer_id = i.customer_id
ORDER BY c.last_name, c.first_name, i.invoice_date ASC

first_name,last_name,invoice_date,current_total,prev_total,next_total
Roberto,Almeida,2021-05-23 00:00:00,0.99,,1.98
Roberto,Almeida,2022-11-14 00:00:00,1.98,0.99,13.86
Roberto,Almeida,2022-12-25 00:00:00,13.86,1.98,8.91
Roberto,Almeida,2023-08-25 00:00:00,8.91,13.86,1.98
Roberto,Almeida,2025-03-31 00:00:00,1.98,8.91,3.96


### 2.9 Calculate a 3-invoice moving average of invoice totals for each customer. Show customer first name, last name, invoice date, invoice total, and the 3-invoice moving average (rounded to 2 decimals). Sort by customer and date.

The result should have columns:

- first_name
- last_name
- invoice_date
- invoice_total
- moving_avg

Order by last_name, first_name, invoice_date ascending.

In [99]:
%%sql
SELECT
    c.first_name,
    c.last_name,
    i.invoice_date,
    i.total AS invoice_total,
    ROUND(
        AVG(i.total) OVER (
            PARTITION BY c.customer_id
            ORDER BY i.invoice_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ),
        2
    )
FROM customer AS c
JOIN invoice AS i
    ON c.customer_id = i.customer_id
ORDER BY c.last_name, c.first_name, i.invoice_date ASC

first_name,last_name,invoice_date,invoice_total,round
Roberto,Almeida,2021-05-23 00:00:00,0.99,0.99
Roberto,Almeida,2022-11-14 00:00:00,1.98,1.49
Roberto,Almeida,2022-12-25 00:00:00,13.86,5.61
Roberto,Almeida,2023-08-25 00:00:00,8.91,8.25
Roberto,Almeida,2025-03-31 00:00:00,1.98,8.25


### 2.10 For each customer, calculate the percent change in invoice totals from one invoice to the next. Show customer first name, last name, invoice date, invoice total, and the percent change from the previous invoice (as a percentage, rounded to 2 decimals). Sort by customer and date.

The result should have columns:

- first_name
- last_name
- invoice_date
- invoice_total
- pct_change (NULL for first invoice per customer)

Order by last_name, first_name, invoice_date ascending.

In [103]:
%%sql
WITH temp AS (
    SELECT
        c.first_name AS first_name,
        c.last_name AS last_name,
        i.invoice_date AS invoice_date,
        i.total AS invoice_total,
        LAG(i.total) OVER (PARTITION BY c.customer_id ORDER BY i.invoice_date) AS prev_invoice_total
    FROM customer AS c
    JOIN invoice AS i
        ON c.customer_id = i.customer_id
    ORDER BY c.last_name, c.first_name, i.invoice_date ASC
) SELECT
    first_name,
    last_name,
    invoice_date,
    invoice_total,
    ROUND(100 * (invoice_total - prev_invoice_total) / prev_invoice_total, 2) AS pct_change
FROM temp

first_name,last_name,invoice_date,invoice_total,pct_change
Roberto,Almeida,2021-05-23 00:00:00,0.99,
Roberto,Almeida,2022-11-14 00:00:00,1.98,100.0
Roberto,Almeida,2022-12-25 00:00:00,13.86,600.0
Roberto,Almeida,2023-08-25 00:00:00,8.91,-35.71
Roberto,Almeida,2025-03-31 00:00:00,1.98,-77.78


## 3. CTEs

### 3.1 Find the top selling artist (by total revenue) for each genre. Show the genre name, artist name, and their total revenue in that genre.

The result should have columns:

- genre_name
- artist_name
- total_revenue

Order by genre_name ascending.

In [42]:
%%sql
WITH temp AS (
    SELECT
        g.name AS genre_name,
        art.name AS artist_name,
        il.unit_price * il.quantity AS total_price
    FROM track AS t
        JOIN album AS a
            ON t.album_id = a.album_id
        JOIN genre AS g
            ON t.genre_id = g.genre_id
        JOIN artist AS art
            ON a.artist_id = art.artist_id
        JOIN invoice_line AS il
            ON t.track_id = il.track_id
), temp_2 AS (
    SELECT
        genre_name,
        artist_name,
        SUM(total_price) AS per_genre_total_sales,
        RANK() OVER (PARTITION BY genre_name ORDER BY SUM(total_price) DESC) AS rank
    FROM temp
    GROUP BY artist_name, genre_name
) SELECT
    genre_name,
    artist_Name,
    per_genre_total_sales AS total_sales
FROM temp_2
WHERE rank = 1

genre_name,artist_name,total_sales
Alternative,Audioslave,4.95
Alternative & Punk,Titãs,33.66
Blues,Eric Clapton,26.73
Bossa Nova,Toquinho & Vinícius,14.85
Classical,Michael Tilson Thomas & San Francisco Symphony,2.97



### 3.2 Find customers who have spent more than the average customer spend overall. Show their full name, country, total spent, and the overall average (same value on every row).

The result should have columns:

- full_name
- country
- total_spent
- avg_spend

Order by total_spent descending.

In [46]:
%%sql
WITH total_spent_per_customer AS (
    SELECT
        c.customer_id,
        SUM(i.total) AS total
    FROM customer AS c
    JOIN invoice AS i
        ON c.customer_id = i.customer_id
    GROUP BY c.customer_id
), avg_spend AS (
    SELECT AVG(total) AS avg
    FROM total_spent_per_customer
) SELECT
    c.first_name || ' ' || c.last_name AS full_name,
    c.country,
    t.total AS total_spent,
    avg_spend.avg AS avg_spend
FROM total_spent_per_customer AS t
JOIN customer AS c
    ON t.customer_id = c.customer_id
JOIN avg_spend
    ON t.total > avg_spend.avg
ORDER BY t.total DESC

full_name,country,total_spent,avg_spend
Helena Holý,Czech Republic,49.62,39.467796610169486
Richard Cunningham,USA,47.62,39.467796610169486
Luis Rojas,Chile,46.62,39.467796610169486
Hugh O'Reilly,Ireland,45.62,39.467796610169486
Ladislav Kovács,Hungary,45.62,39.467796610169486


### 3.3 For each month, find the total revenue and the revenue from the previous month. Show the month, total revenue, and previous month's revenue.

The result should have columns:

- month (formatted as YYYY-MM)
- total_revenue
- prev_month_revenue (NULL if no previous month)

Order by month ascending.

In [64]:
%%sql
WITH year_month_table AS (
    SELECT
        total,
        TO_CHAR(invoice_date, 'YYYY-MM') AS year_month
    FROM invoice
), monthly_total AS (
    SELECT
        year_month,
        SUM(total) AS month_total
    FROM year_month_table
    GROUP BY year_month
) SELECT
    year_month,
    month_total,
    LAG(month_total) OVER (ORDER BY year_month ASC) AS month_prev
FROM monthly_total

year_month,month_total,month_prev
2021-01,35.64,
2021-02,37.62,35.64
2021-03,37.62,37.62
2021-04,37.62,37.62
2021-05,37.62,37.62


### 3.4 Find all playlists along with the number of tracks they contain, the total duration in minutes, and the number of distinct genres represented. Only include playlists that have more than 10 tracks.

The result should have columns:

- playlist_name
- track_count
- total_duration_min (rounded to 2 decimal places)
- genre_count

Order by track_count descending.

In [75]:
%%sql
WITH temp AS (
    SELECT
        p.name AS playlist_name,
        p.playlist_id AS playlist_id,
        pt.track_id AS track_id,
        t.milliseconds AS duration_ms,
        g.genre_id AS genre_id
    FROM playlist_track AS pt
        JOIN track AS t
            ON pt.track_id = t.track_id
        JOIN genre AS g
            ON t.genre_id = g.genre_id
        JOIN playlist AS p
            ON pt.playlist_id = p.playlist_id
) SELECT
    playlist_name,
    playlist_id,
    COUNT(track_id) AS track_count,
    ROUND(SUM(duration_ms) / (1000 * 60), 2) AS total_minutes,
    COUNT(DISTINCT genre_id) AS distinct_genre_count
FROM temp
GROUP BY playlist_id, playlist_name
HAVING COUNT(track_id) > 10

playlist_name,playlist_id,track_count,total_minutes,distinct_genre_count
Music,1,3290,14628.0,20
TV Shows,3,213,8351.0,5
90’s Music,5,1477,6645.0,16
Music,8,3290,14628.0,20
TV Shows,10,213,8351.0,5


### 3.5 Calculate the year-over-year percentage growth in total sales. Show the year, total sales for that year, and the percentage growth from the previous year, rounded to 2 decimal places.

The result should have columns:

- year
- total_sales
- pct_growth (NULL for the first year)

Order by year ascending.

### 3.6 Calculate the 3-month moving average of total sales. Show the year-month, the total sales for that month, and the 3-month moving average (rounded to 2 decimals). Order by date.

The result should have columns:

- year_month (formatted as YYYY-MM)
- month_total
- moving_avg_3m (NULL for first 2 months)

Order by year_month ascending.

## 4. Self Joins

### 4.1 The `employee` table has a `reports_to` column that references itself. Show each employee's full name, their title, and their manager's full name. If they have no manager, show NULL.

The result should have columns:

- employee_name
- title
- manager_name

Order by manager_name NULLS FIRST, then employee_name ascending.

In [80]:
%%sql
SELECT
    el.first_name || ' ' || el.last_name AS employee_full_name,
    el.title,
    er.first_name || ' ' || er.last_name AS manager_name
FROM employee AS el
    LEFT JOIN employee AS er
        ON el.reports_to = er.employee_id

employee_full_name,title,manager_name
Andrew Adams,General Manager,
Nancy Edwards,Sales Manager,Andrew Adams
Jane Peacock,Sales Support Agent,Nancy Edwards
Margaret Park,Sales Support Agent,Nancy Edwards
Steve Johnson,Sales Support Agent,Nancy Edwards


## 5. Pivot / Crosstab

### 5.1 Create a report that shows the total sales for each genre in each year.

The result should have columns:

- genre_name
- One column per year (e.g. 2021, 2022, 2023, ...)

Order by genre_name ascending.

Hint: look into PostgreSQL's `crosstab` function or use conditional aggregation with `CASE WHEN`.