<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/3_Windows_Functions/5_Frame_Clause.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Frame Clause

## Overview

### 🥅 Analysis Goals

- What we’re going to use for this dataset to do X e.g. Use the following in order to explore a dataset on experience and salaries
    - Major topic 1
    - Major topic 2
    - Major topic 3
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### 📘 Concepts Covered

General concepts we’re going to cover

- `N PRECEDING` 
- `N FOLLOWING` 
- `UNBOUNDED PRECEDING` 
- `UNBOUNDED FOLLOWING` 
- `CURRENT ROW`

---

In [10]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---

## PRECEEDING / FOLLOWING

### 📝 Notes

- `N PRECEDING`: Refers to a range that includes the current row and the N rows before it.
    - `SUM(column) OVER (ORDER BY order_expression ROWS 3 PRECEDING)`
- `N FOLLOWING`: Refers to a range that includes the current row and the N rows after it.
    - `SUM(column) OVER (ORDER BY order_expression ROWS 3 FOLLOWING)`

```sql
SELECT 
    column,
    SUM(column) OVER (ORDER BY order_expression ROWS 3 PRECEDING) AS sum_3_preceding,
    SUM(column) OVER (ORDER BY order_expression ROWS 3 FOLLOWING) AS sum_3_following
FROM table_name
```

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

## UNBOUND

### 📝 Notes

- `UNBOUNDED PRECEDING`: Starts from the very first row of the partition up to the current row.
    - `SUM(column) OVER (ORDER BY order_expression ROWS UNBOUNDED PRECEDING)`
- `UNBOUNDED FOLLOWING`: Includes the current row and extends to the last row of the partition.
    - `SUM(column) OVER (ORDER BY order_expression ROWS UNBOUNDED FOLLOWING)`

```sql
SELECT 
    column,
    SUM(column) OVER (ORDER BY order_expression ROWS UNBOUNDED PRECEDING) AS sum_unbounded_preceding,
    SUM(column) OVER (ORDER BY order_expression ROWS UNBOUNDED FOLLOWING) AS sum_unbounded_following
FROM table_name
```


### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

## CURRENT ROW

### 📝 Notes

- `CURRENT ROW`: Represents only the current row in the calculation.
    - `SUM(column) OVER (ORDER BY order_expression ROWS CURRENT ROW)`


```sql
SELECT 
    column,
    SUM(column) OVER (ORDER BY order_expression ROWS CURRENT ROW) AS sum_current_row,
    SUM(column) OVER (ORDER BY order_expression ROWS 3 PRECEDING) AS sum_3_preceding,
    SUM(column) OVER (ORDER BY order_expression ROWS UNBOUNDED PRECEDING) AS sum_unbounded_preceding
FROM table_name

```

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Get the `cohort_year` and the total revenue for each user.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer.  
   - Group by `customerkey` to ensure the revenue and cohort year are calculated per user.  
   - Calculate the total revenue for each customer using `SUM(quantity * netprice * exchangerate)`.  
   - Select `cohort_year`, `customerkey`, and the total revenue (`total_customer_net_revenue`) to display the results.  

In [17]:
%%sql

SELECT 
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
    customerkey,
    SUM(quantity * netprice * exchangerate) AS total_customer_net_revenue
FROM sales
GROUP BY 
    customerkey

Unnamed: 0,cohort_year,customerkey,total_customer_net_revenue
0,2018,2044589,2470.73
1,2021,1603477,136.62
2,2017,876049,2601.13
3,2024,1469222,5278.54
4,2018,2089398,98.39
...,...,...,...
49482,2019,853617,903.31
49483,2016,1573639,6973.42
49484,2022,1355936,149.99
49485,2024,967453,5.40


2. Create a CTE to calculate the cohort year for each customer and return all results in the main query.  
   - Define a CTE `cohort_analysis` to extract the cohort year for each customer.  
      - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer.  
      - Group by `customerkey` to ensure the revenue and cohort year are calculated per user.  
      - Calculate the total revenue for each customer using `SUM(quantity * netprice * exchangerate)`.  
   - In the main query, use `SELECT * FROM cohort_analysis` to return all the results from the CTE.  

In [16]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        orderdate
)

SELECT *
FROM cohort_analysis

Unnamed: 0,cohort_year,orderdate,total_net_revenue
0,2017,2017-10-23,74893.02
1,2023,2023-04-24,43321.33
2,2015,2015-01-19,12002.09
3,2017,2017-12-28,101464.75
4,2019,2019-02-12,156723.97
...,...,...,...
3289,2023,2023-10-19,114969.37
3290,2017,2017-10-29,649.59
3291,2023,2023-02-08,158675.41
3292,2021,2021-08-03,47364.43


In [15]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate
    FROM cohort_analysis
)

SELECT *
FROM rolling_ltv

Unnamed: 0,cohort_year,orderdate
0,2017,2017-10-23
1,2023,2023-04-24
2,2019,2019-02-12
3,2017,2017-12-28
4,2015,2015-01-19
...,...,...
3289,2023,2023-10-19
3290,2023,2023-02-08
3291,2017,2017-10-29
3292,2021,2021-08-03


In [13]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate) AS cumulative_revenue
    FROM cohort_analysis
)

SELECT *
FROM rolling_ltv

Unnamed: 0,cohort_year,orderdate,cumulative_revenue
0,2015,2015-01-01,11640.80
1,2015,2015-01-02,17531.20
2,2015,2015-01-03,37327.87
3,2015,2015-01-05,49734.14
4,2015,2015-01-06,60084.01
...,...,...,...
3289,2024,2024-04-16,8189913.64
3290,2024,2024-04-17,8222852.31
3291,2024,2024-04-18,8251261.07
3292,2024,2024-04-19,8299647.95


In [18]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_revenue -- Updated
    FROM cohort_analysis
)

SELECT *
FROM rolling_ltv

Unnamed: 0,cohort_year,orderdate,cumulative_revenue
0,2015,2015-01-01,11640.80
1,2015,2015-01-02,17531.20
2,2015,2015-01-03,37327.87
3,2015,2015-01-05,49734.14
4,2015,2015-01-06,60084.01
...,...,...,...
3289,2024,2024-04-16,8189913.64
3290,2024,2024-04-17,8222852.31
3291,2024,2024-04-18,8251261.07
3292,2024,2024-04-19,8299647.95


In [9]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_revenue,
        COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS user_count
    FROM cohort_analysis
)

SELECT
    cohort_year,
    orderdate,
    cumulative_revenue,
    user_count,
    cumulative_revenue / user_count AS rolling_avg_ltv
FROM rolling_ltv;

Unnamed: 0,cohort_year,orderdate,cumulative_revenue,user_count,rolling_avg_ltv
0,2015,2015-01-01,11640.80,1,11640.80
1,2015,2015-01-02,17531.20,2,8765.60
2,2015,2015-01-03,37327.87,3,12442.62
3,2015,2015-01-05,49734.14,4,12433.53
4,2015,2015-01-06,60084.01,5,12016.80
...,...,...,...,...,...
3289,2024,2024-04-16,8189913.64,105,77999.18
3290,2024,2024-04-17,8222852.31,106,77574.08
3291,2024,2024-04-18,8251261.07,107,77114.59
3292,2024,2024-04-19,8299647.95,108,76848.59


In [20]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_revenue,
        COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS user_count
    FROM cohort_analysis
)

SELECT
    cohort_year,
    orderdate,
    cumulative_revenue,
    user_count,
    cumulative_revenue / user_count AS rolling_avg_ltv
FROM rolling_ltv;

Unnamed: 0,cohort_year,orderdate,cumulative_revenue,user_count,rolling_avg_ltv
0,2015,2015-01-01,11640.80,1,11640.80
1,2015,2015-01-02,17531.20,2,8765.60
2,2015,2015-01-03,37327.87,3,12442.62
3,2015,2015-01-05,49734.14,4,12433.53
4,2015,2015-01-06,60084.01,5,12016.80
...,...,...,...,...,...
3289,2024,2024-04-16,8189913.64,105,77999.18
3290,2024,2024-04-17,8222852.31,106,77574.08
3291,2024,2024-04-18,8251261.07,107,77114.59
3292,2024,2024-04-19,8299647.95,108,76848.59
