<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/3_Windows_Functions/5_Frame_Clause.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Frame Clause

## Overview

### 🥅 Analysis Goals

Analyze cohort revenue and lifetime value (LTV) to uncover monthly trends, short-term fluctuations, future potential, and long-term customer value patterns.

- **Monthly Revenue Trends:** Calculate monthly net revenue for each cohort to track individual day performance without cumulative or rolling effects.  
- **Mid-Term Revenue Patterns:** Compute rolling 3-month revenue sums to smooth monthly fluctuations and reveal purchasing behaviors.  
- **Projected Short-Term Revenue:** Summarize net revenue for the next 3 months by cohort to identify patterns and assist in mid-term forecasting.  

### 📘 Concepts Covered

- `CURRENT ROW`
- `N PRECEDING` 
- `N FOLLOWING` 
- `UNBOUNDED`
    - `UNBOUNDED PRECEDING`
    - `UNBOUNDED FOLLOWING` 

---

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## ROWS, RANGE, GROUPS

[Postgres Source Documentation on Window Function Calls](https://www.postgresql.org/docs/current/sql-expressions.html#:~:text=%5B%20frame_clause%20%5D)

#### `ROWS`

- Defines window frame based on physical row position
- Counts actual rows before/after current row
- Provides precise control over row inclusion

```sql
<window function> OVER (
    PARTITION BY column
    ORDER BY column
    ROWS start_frame
)
```

```sql
<window function> OVER (
    PARTITION BY column
    ORDER BY column
    ROWS BETWEEN start_frame AND end_frame
)
```

#### `start_frame` & `end_frame`

**Start Frame & End Frame:**
- `CURRENT ROW`: Just the current row (simplest)
- `UNBOUNDED PRECEDING`: All rows from start to current row
- `UNBOUNDED FOLLOWING`: All rows from current to end
- `N PRECEDING`: N rows before current row 
- `N FOLLOWING`: N rows after current row


```sql
UNBOUNDED PRECEDING
N PRECEDING
CURRENT ROW
N FOLLOWING
UNBOUNDED FOLLOWING
```

#### `RANGE` & `GROUP`

**RANGE**
- Defines window frame based on logical value ranges rather than physical rows
- Useful for time-series data where you want to group by value ranges (e.g. date ranges)
- Treats rows with equal ORDER BY values as a single group

**GROUPS**
- Groups rows that share the same values in the ORDER BY column
- Useful when you want to treat tied values as a single unit


```sql
<window function> OVER (
    PARTITION BY column
    ORDER BY column
    { RANGE | GROUPS } BETWEEN start_frame AND end_frame
)
```

**Why we aren't covering `RANGE` & `GROUP`?**
- 📊 RANGE and GROUPS are less commonly used in practice compared to ROWS
- 🔍 ROWS is more intuitive and sufficient for most window function use cases
- ⚠️ Some databases don't support RANGE and GROUPS (e.g. MySQL)
- ⚡ ROWS provides better performance in most cases  

| SQL Feature | PostgreSQL | MySQL | SQL Server | Oracle | Snowflake | BigQuery |
|------------|------------|--------|------------|--------|------------|----------|
| **ROWS**  | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| **RANGE** | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
| **GROUPS** | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |  

---

## CURRENT ROW

### 📝 Notes

`CURRENT ROW`

- **CURRENT ROW**: Refers to the current row in a window frame during a query execution.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS window_column_alias
  FROM table_name;
  ```
> **NOTE:** using `ROWS BETWEEN CURRENT ROW AND CURRENT ROW` is redundant and practically useless in SQL window functions. It essentially means the window consists of just the current row, which is the same as using `SUM(column_name) OVER (...)` without specifying `ROWS`.

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Individual Order Value: Revenue from a single transaction
  - Point-in-Time Analysis: Examining metrics at a specific moment
  - Transaction-Level Detail: Granular view of each sale
- **💡 Why It Matters**: Enables precise analysis of individual order performance
    - Identifies specific high-value or low-value transactions
    - Helps spot anomalies in order patterns
    - Helps isolate daily performance trends for each cohort without being influenced by revenue on other days.
- **🎯 Common Use Cases**: Transaction monitoring, order analysis
- **📈 Related KPIs**: Order value, transaction frequency

### 📈 Analysis

- Calculates the exact revenue for each individual order, providing a precise view of transaction-level performance without any aggregation.
- Calculates the exact net revenue for each cohort on a daily basis without rolling or cumulative sums.

### Current Row Revenue

**`CURRENT ROW`**

1. Calculate the exact revenue for each order using GROUP BY and SUM.
   - Select `orderdate`, `orderkey`, and `linenumber` to identify each order line
   - Calculate revenue using `SUM(quantity * netprice * exchangerate)` 
   - Group by order identifiers to get revenue per line item
   - Limit results to see sample output

In [31]:
%%sql

SELECT 
    orderdate,
    orderkey,
    linenumber,
    SUM(quantity * netprice * exchangerate) AS net_revenue
FROM sales
GROUP BY
    orderdate,
    orderkey,
    linenumber
LIMIT 10

Unnamed: 0,orderdate,orderkey,linenumber,net_revenue
0,2015-01-01,1000,0,63.49
1,2015-01-01,1000,1,423.28
2,2015-01-01,1001,0,108.75
3,2015-01-01,1002,0,1146.75
4,2015-01-01,1002,1,950.25
5,2015-01-01,1002,2,1302.91
6,2015-01-01,1002,3,58.73
7,2015-01-01,1003,0,224.98
8,2015-01-01,1004,0,263.11
9,2015-01-01,1004,1,578.52


2. Calculate the exact revenue for each order using `CURRENT ROW`.
   - Select `orderdate` to identify each transaction
   - Calculate revenue using `quantity * netprice * exchangerate`
   - Use `CURRENT ROW` frame to isolate individual order revenue
   - Order by `orderdate` to see chronological progression

In [66]:
%%sql

SELECT 
    orderdate,
    orderkey,
    linenumber,
    SUM(quantity * netprice * exchangerate) OVER (
        ROWS CURRENT ROW        -- run without this as well
    ) as net_revenue
FROM sales
LIMIT 10;

Unnamed: 0,orderdate,orderkey,linenumber,net_revenue
0,2015-01-01,1000,0,63.49
1,2015-01-01,1000,1,423.28
2,2015-01-01,1001,0,108.75
3,2015-01-01,1002,0,1146.75
4,2015-01-01,1002,1,950.25
5,2015-01-01,1002,2,1302.91
6,2015-01-01,1002,3,58.73
7,2015-01-01,1003,0,224.98
8,2015-01-01,1004,0,263.11
9,2015-01-01,1004,1,578.52


In [32]:
%%sql

SELECT 
    orderdate,
    orderkey,
    linenumber,
    SUM(quantity * netprice * exchangerate) OVER (
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW        -- run without this as well
    ) as net_revenue
FROM sales
LIMIT 10;

Unnamed: 0,orderdate,orderkey,linenumber,net_revenue
0,2015-01-01,1000,0,63.49
1,2015-01-01,1000,1,423.28
2,2015-01-01,1001,0,108.75
3,2015-01-01,1002,0,1146.75
4,2015-01-01,1002,1,950.25
5,2015-01-01,1002,2,1302.91
6,2015-01-01,1002,3,58.73
7,2015-01-01,1003,0,224.98
8,2015-01-01,1004,0,263.11
9,2015-01-01,1004,1,578.52


### Monthly Sales Analysis

**`CURRENT ROW`**

1. Calculate monthly revenue for 2023
   - Extract month from orderdate using `TO_CHAR()`
   - Calculate revenue using `quantity * netprice * exchangerate`
   - Filter for year 2023
   - Group and order by month

In [34]:
%%sql

SELECT 
    TO_CHAR(orderdate, 'YYYY-MM') as month,
    SUM(quantity * netprice * exchangerate) as net_revenue
FROM sales
WHERE EXTRACT(YEAR FROM orderdate) = 2023
GROUP BY month
ORDER BY month
    

Unnamed: 0,month,net_revenue
0,2023-01,3664431.34
1,2023-02,4465204.57
2,2023-03,2244316.52
3,2023-04,1162796.16
4,2023-05,2943005.99
5,2023-06,2864500.03
6,2023-07,2337639.34
7,2023-08,2623919.79
8,2023-09,2622774.85
9,2023-10,2551322.61


2. Calculate monthly revenue with `CURRENT ROW` frame
   - Use CTE to store monthly revenue calculation
   - Calculate revenue using `AVG()` window function
   - Use `ROWS BETWEEN CURRENT ROW AND CURRENT ROW` frame
   - Compare original revenue with windowed calculation

In [67]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS CURRENT ROW -- ROWS BETWEEN CURRENT ROW AND CURRENT ROW
        ) as net_revenue_current
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_current
0,2023-01,3664431.34,3664431.34
1,2023-02,4465204.57,4465204.57
2,2023-03,2244316.52,2244316.52
3,2023-04,1162796.16,1162796.16
4,2023-05,2943005.99,2943005.99
5,2023-06,2864500.03,2864500.03
6,2023-07,2337639.34,2337639.34
7,2023-08,2623919.79,2623919.79
8,2023-09,2622774.85,2622774.85
9,2023-10,2551322.61,2551322.61


#### Daily Revenue by Cohort

**`CURRENT ROW`**

1. Get the `cohort_year` for each customer and the total revenue for each day.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually.  
   - Include `orderdate` in the `GROUP BY` clause to calculate daily revenue for each customer.  
   - Use `SUM(quantity * netprice * exchangerate)` to calculate the total net revenue for each day.  
   - Select `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue` for the final output.  

In [3]:
%%sql

SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate


Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


2. Create a CTE to calculate the cohort year for each day and return all results in the main query.  
   - 🔔 Define a CTE `cohort_analysis` to calculate the cohort year and total daily revenue for each customer.  
      - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order date.  
      - Group the CTE by `customerkey` and `orderdate` to calculate daily net revenue per customer.  
      - Use `SUM(quantity * netprice * exchangerate)` to compute the total net revenue for each day.  
   - 🔔 In the main query, use `SELECT * FROM cohort_analysis` to return all calculated results, including `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue`.  

In [4]:
%%sql

-- Put query into a CTE
WITH cohort_analysis AS (
    SELECT 
        customerkey, 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
)

-- Added a SELECT statement to query the CTE
SELECT *
FROM cohort_analysis;

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


3. Add a new CTE to get the total daily revenue for each cohort and return all results in the main query.  

   - Define a CTE `cohort_analysis` to calculate the cohort year and total daily revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order date.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to compute the total net revenue for each customer per day using `SUM(quantity * netprice * exchangerate)`.  
   - 🔔 Add a second CTE `cohort_summary` to aggregate monthly revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to calculate the monthly total revenue for all customers within each cohort. 
        - Get the order year month from `orderdate` using `DATE_TRUNC` and cast to a date with `::date`.
        - Group `cohort_summary` by `cohort_year` and `year_month` to summarize the data at the cohort and daily level.  
   - 🔔 In the main query, select all columns from `cohort_summary` to display `cohort_year`, `year_month`, and `total_revenue`.  
        - Use `ORDER BY cohort_year, year_month` to sort the results by cohort year and the order year-month in ascending order.  

In [5]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
), 

-- Added 
cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        year_month
)

-- Updated 
SELECT
    *
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month;

Unnamed: 0,cohort_year,year_month,total_revenue
0,2015,2015-01-01,384092.66
1,2015,2015-02-01,706374.12
2,2015,2015-03-01,332961.59
3,2015,2015-04-01,160767.00
4,2015,2015-05-01,548632.63
...,...,...,...
107,2023,2023-12-01,2928550.93
108,2024,2024-01-01,2677498.55
109,2024,2024-02-01,3542322.55
110,2024,2024-03-01,1692854.89


4. In the main query, use `CURRENT ROW` to get the daily net revenue for each cohort.  

   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
        - Aggregate `total_net_revenue` in `cohort_analysis` by grouping on `customerkey` and `orderdate`.  
   - Define a second CTE `cohort_summary` to calculate the monthly total revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to calculate the monthly total revenue for all customers within each cohort. 
        - Get the order year month from `orderdate` using `DATE_TRUNC` and cast to a date with `::date`.
        - Group `cohort_summary` by `cohort_year` and `year_month` to summarize the data at the cohort and daily level.  
   - In the main query, use a window function to calculate the monthly net revenue for each cohort.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - Select `cohort_year`, `year_month`, and `monthly_net_revenue` for the output.  
        - Order the results by `cohort_year` and `year_month` to display them in chronological order for each cohort.  

In [6]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),
cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        year_month
)

SELECT
    cohort_year,
    year_month,
    -- Added
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year -- Separate aggregation for each cohort
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue
0,2015,2015-01-01,384092.66
1,2015,2015-02-01,706374.12
2,2015,2015-03-01,332961.59
3,2015,2015-04-01,160767.00
4,2015,2015-05-01,548632.63
...,...,...,...
107,2023,2023-12-01,2928550.93
108,2024,2024-01-01,2677498.55
109,2024,2024-02-01,3542322.55
110,2024,2024-03-01,1692854.89


<img src="../Resources/images/3.5_cohort_monthly_rev.png" alt="Monthly Cohort Revenue" width="50%">

> ⚠️ **Chart Note**: This plots only for 2015 Cohort.

---

## N PRECEDING

`PRECEDING`

- **N PRECEDING**: Refers to `N` rows before the current row in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN N PRECEDING AND CURRENT ROW
    ) AS window_column_alias
  FROM table_name;
  ```
- Enables calculations involving the current row and up to `N` preceding rows, such as moving averages or cumulative sums for a fixed number of rows.

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Rolling Window: Moving period of analysis
  - Sequential Analysis: Study of consecutive transactions
  - Moving Average: Average over a sliding timeframe
- **💡 Why It Matters**: Reveals trends by smoothing out daily fluctuations
    - Shows short-term patterns in customer behavior
    - Helps identify seasonal or cyclical trends
    - Identifies mid-term trends in net revenue, such as monthly purchasing patterns.
- **🎯 Common Use Cases**: Trend analysis, pattern detection
- **📈 Related KPIs**: Rolling revenue, moving averages

### 📈 Analysis

- Calculates a 10 order rolling sum of revenue that includes the current order and previous 9 orders, smoothing out daily fluctuations to reveal short-term trends.
- Computes the rolling 3-month sum of net revenue for each cohort, smoothing fluctuations in monthly revenue.

### Rolling Order Revenue

**`N PRECEDING`**

1. Calculate a rolling sum of revenue that includes the current order and previous 6 orders using `N PRECEDING`.
   - Select `orderdate` to identify each transaction
   - Calculate revenue using `quantity * netprice * exchangerate`
   - Use `6 PRECEDING` frame to include previous 6 orders in calculation
   - Order by `orderdate` to ensure proper sequential calculation

In [7]:
%%sql

SELECT 
    orderdate,
    (quantity * netprice * exchangerate) AS net_revenue,
    SUM(quantity * netprice * exchangerate) OVER (
        ORDER BY orderdate
        ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
    ) as rolling_10_order_revenue
FROM sales
ORDER BY orderdate;

Unnamed: 0,orderdate,net_revenue,rolling_10_order_revenue
0,2015-01-01,63.49,63.49
1,2015-01-01,423.28,486.77
2,2015-01-01,108.75,595.53
3,2015-01-01,1146.75,1742.27
4,2015-01-01,950.25,2692.52
...,...,...,...
199868,2024-04-20,914.61,8607.71
199869,2024-04-20,150.18,8241.71
199870,2024-04-20,147.78,8159.59
199871,2024-04-20,2019.62,8307.84


### Monthly Sales Analysis - Preceding Monthly Average

**`N PRECEDING`**

1. Calculate monthly revenue for 2023
   - Extract month from orderdate using `TO_CHAR()`
   - Calculate revenue using `quantity * netprice * exchangerate`
   - Filter for year 2023
   - Group and order by month

In [48]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 1 PRECEDING AND CURRENT ROW -- sub in 0 PRECEDING
        ) as net_revenue_preceding
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_preceding
0,2023-01,3664431.34,3664431.34
1,2023-02,4465204.57,4064817.96
2,2023-03,2244316.52,3354760.54
3,2023-04,1162796.16,1703556.34
4,2023-05,2943005.99,2052901.08
5,2023-06,2864500.03,2903753.01
6,2023-07,2337639.34,2601069.68
7,2023-08,2623919.79,2480779.57
8,2023-09,2622774.85,2623347.32
9,2023-10,2551322.61,2587048.73


Let's see how changing the PRECEDING value affects the average:

> With `1 PRECEDING`: `AVG = (current_month + prev_month) / 2`  
> With `2 PRECEDING`: `AVG = (current_month + prev_month_1 + prev_month_2) / 3`  
> With `3 PRECEDING`: `AVG = (current_month + prev_month_1 + prev_month_2 + prev_month_3) / 4`  

In [49]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
        ) as net_revenue_preceding_1,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) as net_revenue_preceding_2,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
        ) as net_revenue_preceding_3
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_preceding_1,net_revenue_preceding_2,net_revenue_preceding_3
0,2023-01,3664431.34,3664431.34,3664431.34,3664431.34
1,2023-02,4465204.57,4064817.96,4064817.96,4064817.96
2,2023-03,2244316.52,3354760.54,3457984.14,3457984.14
3,2023-04,1162796.16,1703556.34,2624105.75,2884187.15
4,2023-05,2943005.99,2052901.08,2116706.22,2703830.81
5,2023-06,2864500.03,2903753.01,2323434.06,2303654.68
6,2023-07,2337639.34,2601069.68,2715048.45,2326985.38
7,2023-08,2623919.79,2480779.57,2608686.39,2692266.29
8,2023-09,2622774.85,2623347.32,2528111.33,2612208.5
9,2023-10,2551322.61,2587048.73,2599339.08,2533914.15


<img src="../Resources/images/3.5_precede_rev.png" alt="Cohort Rolling 3 Month Revenue" width="50%">
<img src="../Resources/images/3.5_preced_rev_3.png" alt="Cohort Rolling 3 Month Revenue" width="50%">

#### 3-Month Rolling Revenue by Cohort

**`PRECEEDING`**

1. Use the previous query and add a new windows function to calculate the rolling 30-day net revenue using `PRECEDING`.  
   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
        - Aggregate `total_net_revenue` in `cohort_analysis` by grouping on `customerkey` and `orderdate`.  
   - Define the second CTE `cohort_summary` to calculate the monthly total revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the monthly total revenue for each cohort, grouped by `cohort_year` and `year_month`. 
        -Get the order year month from `orderdate` using `DATE_TRUNC` and cast to a date with `::date`. 
   - In the main query, calculate the rolling 3-month net revenue for each cohort using a window function.  
        - Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)` to include revenue from the current day and the previous 2 months.  
        - Select `cohort_year`, `year_month`, and `rolling_3_month_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, year_month` to ensure results are sorted chronologically by cohort and date.  

In [8]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue,
    -- Added
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS rolling_3_month_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue,rolling_3_month_net_revenue
0,2015,2015-01-01,384092.66,384092.66
1,2015,2015-02-01,706374.12,1090466.78
2,2015,2015-03-01,332961.59,1423428.37
3,2015,2015-04-01,160767.00,1200102.71
4,2015,2015-05-01,548632.63,1042361.22
...,...,...,...,...
107,2023,2023-12-01,2928550.93,8179976.91
108,2024,2024-01-01,2677498.55,2677498.55
109,2024,2024-02-01,3542322.55,6219821.10
110,2024,2024-03-01,1692854.89,7912675.99


<img src="../Resources/images/3.5_cohort_3_month_rev.png" alt="Cohort Rolling 3 Month Revenue" width="50%">

> ⚠️ **Chart Note**: This plots only for 2015 Cohort.

---
## N FOLLOWING

### 📝 Notes

`N FOLLOWING`

- **N FOLLOWING**: Refers to `N` rows after the current row in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN CURRENT ROW AND N FOLLOWING
    ) AS window_column_alias
  FROM table_name;
  ```
- Useful for calculating aggregations involving the current row and a specified number of subsequent rows, such as projecting future totals or averages.

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Forward Analysis: Looking at upcoming transactions
  - Future Value: Revenue from subsequent orders
  - Predictive Window: Period of future analysis
- **💡 Why It Matters**: Projects short-term future performance
    - Anticipates upcoming revenue patterns
    - Identifies potential changes in customer behavior
    - Useful for forecasting and identifying patterns in purchasing activity after a given month.
- **🎯 Common Use Cases**: Short-term forecasting, trend prediction
- **📈 Related KPIs**: Forward revenue, future order value

### 📈 Analysis

- Calculates the total revenue for the current order plus the next 10 orders, providing a forward-looking view of short-term revenue patterns.
- Summarizes the next 3 months of net revenue for each cohort and month, projecting mid-term revenue performance.


### Forward-Looking Revenue

**`N FOLLOWING`**

1. Calculate the sum of revenue for the current order and next 2 orders using `2 FOLLOWING`.
   - Select `orderdate` to identify each transaction
   - Calculate revenue using `quantity * netprice * exchangerate`
   - Use `2 FOLLOWING` frame to include next 2 orders in calculation
   - Order by `orderdate` to ensure proper sequential calculation

In [9]:
%%sql

SELECT 
    orderdate,
    (quantity * netprice * exchangerate) AS net_revenue,
    SUM(quantity * netprice * exchangerate) OVER (
        ORDER BY orderdate
        ROWS BETWEEN CURRENT ROW AND 9 FOLLOWING
    ) as next_10_order_revenue
FROM sales
ORDER BY orderdate;

Unnamed: 0,orderdate,net_revenue,next_10_order_revenue
0,2015-01-01,63.49,5120.77
1,2015-01-01,423.28,5066.94
2,2015-01-01,108.75,7038.75
3,2015-01-01,1146.75,7905.15
4,2015-01-01,950.25,6764.59
...,...,...,...
199868,2024-04-20,914.61,3289.02
199869,2024-04-20,150.18,2374.41
199870,2024-04-20,147.78,2224.23
199871,2024-04-20,2019.62,2076.45


### Monthly Sales Analysis - 3 Month Rolling Average

**`PRECEDING`**

1. Calculate the average net revenue for the previous month and current month using `1 PRECEDING AND CURRENT ROW`
   - Use `AVG(net_revenue)` to calculate the average revenue 
   - Apply `ROWS BETWEEN 1 PRECEDING AND CURRENT ROW` to include previous month and current month
   - Formula: `AVG(net_revenue) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)`

In [58]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
        ) as net_revenue_preceding,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING -- sub in 0 FOLLOWING
        ) as net_revenue_following
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_preceding,net_revenue_following
0,2023-01,3664431.34,3664431.34,4064817.96
1,2023-02,4465204.57,4064817.96,3354760.54
2,2023-03,2244316.52,3354760.54,1703556.34
3,2023-04,1162796.16,1703556.34,2052901.08
4,2023-05,2943005.99,2052901.08,2903753.01
5,2023-06,2864500.03,2903753.01,2601069.68
6,2023-07,2337639.34,2601069.68,2480779.57
7,2023-08,2623919.79,2480779.57,2623347.32
8,2023-09,2622774.85,2623347.32,2587048.73
9,2023-10,2551322.61,2587048.73,2625712.99


2. Calculate the rolling average net revenue for each month using a 3-month window centered on the current month:
    - Use `AVG(net_revenue)` to calculate the average revenue
    - Apply `ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING` to include:
    - Previous month (1 PRECEDING)
    - Current month (CURRENT ROW) 
    - Next month (1 FOLLOWING)
    - Formula: `AVG(net_revenue) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)`

In [57]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
        ) as net_revenue_rolling
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_rolling
0,2023-01,3664431.34,4064817.96
1,2023-02,4465204.57,3457984.14
2,2023-03,2244316.52,2624105.75
3,2023-04,1162796.16,2116706.22
4,2023-05,2943005.99,2323434.06
5,2023-06,2864500.03,2715048.45
6,2023-07,2337639.34,2608686.39
7,2023-08,2623919.79,2528111.33
8,2023-09,2622774.85,2599339.08
9,2023-10,2551322.61,2624733.61


<img src="../Resources/images/3.5_rolling_rev.png" alt="3 Month Rolling Revenue" width="50%">

#### Future 3-Month Revenue by Cohort

**`FOLLOWING`**

1. Use the previous query and update`rolling_3_month_net_revenue` to calculate the next 3-month net revenue using `FOLLOWING`.  
   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to calculate the daily `total_net_revenue` for each customer.  
   - Define the second CTE `cohort_summary` to calculate the total revenue for each cohort by day.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the monthly total revenue for each cohort, grouped by `cohort_year` and `year_month`.  
        - Get the order year month from `orderdate` using `DATE_TRUNC` and cast to a date with `::date`.
   - In the main query, calculate the next 3-month net revenue for each cohort using a window function.  
        - Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING)` to include revenue from the current day and the next 3 months.  
        - Select `cohort_year`, `year_month`, and `rolling_3_day_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, orderdate` to ensure results are sorted chronologically by cohort and month.  

In [10]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month 
        ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING
    ) AS rolling_prev_3_month_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month
;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue,rolling_prev_3_month_net_revenue
0,2015,2015-01-01,384092.66,1584195.37
1,2015,2015-02-01,706374.12,1748735.34
2,2015,2015-03-01,332961.59,1790925.19
3,2015,2015-04-01,160767.00,2093339.73
4,2015,2015-05-01,548632.63,2651111.35
...,...,...,...,...
107,2023,2023-12-01,2928550.93,2928550.93
108,2024,2024-01-01,2677498.55,8396527.38
109,2024,2024-02-01,3542322.55,5719028.83
110,2024,2024-03-01,1692854.89,2176706.28


<img src="../Resources/images/3.5_cohort_prev_3_month_rev.png" alt="Cohort Previous 3 Month Revenue" width="50%">

> ⚠️ **Chart Note**: This plots only for 2015 Cohort.

---
## UNBOUNDED

### 📝 Notes

`UNBOUNDED PRECEDING`

- **UNBOUNDED PRECEDING**: Refers to the first row of the partition or dataset in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS window_column_alias
  FROM table_name;
  ```
- Commonly used for cumulative calculations starting from the beginning of a partition, such as running totals or cumulative averages.

`UNBOUNDED FOLLOWING`

- **UNBOUNDED FOLLOWING**: Refers to the last row of the partition or dataset in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    ) AS window_column_alias
  FROM table_name;
  ```
- Often used to aggregate values from the current row to the end of the partition, such as totals or counts of future data.

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Cumulative Total: Running sum of all values
  - Historical Performance: Complete transaction history
  - Aggregate Growth: Total accumulation over time
- **💡 Why It Matters**: Shows complete historical performance
    - Tracks long-term cohort growth and overall net revenue contributions over time.
    - Assists in understanding future net revenue potential from a given point in time.
- **🎯 Common Use Cases**: Growth analysis, performance tracking
- **📈 Related KPIs**: Total revenue, growth rate

### 📈 Analysis
- Calculates the running total of all revenue from the very first order up to each current order, showing the complete accumulation of revenue over time.
- Calculates the total accumulated net revenue up to each month for every cohort.
- Measures remaining cumulative net revenue from each order month to the end of the cohort’s activity.


### Cumulative Revenue

**`UNBOUNDED PRECEDING`**

1. Calculate the running total of revenue from the first order up to each current order using `UNBOUNDED PRECEDING`.
   - Select `orderdate` to identify each transaction
   - Calculate revenue using `quantity * netprice * exchangerate`
   - Use `UNBOUNDED PRECEDING` frame to include all previous orders
   - Order by `orderdate` to ensure proper cumulative calculation

In [11]:
%%sql

SELECT 
    orderdate,
    (quantity * netprice * exchangerate) AS net_revenue,
    SUM(quantity * netprice * exchangerate) OVER (
        ORDER BY orderdate
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_revenue
FROM sales
ORDER BY orderdate;

Unnamed: 0,orderdate,net_revenue,cumulative_revenue
0,2015-01-01,63.49,63.49
1,2015-01-01,423.28,486.77
2,2015-01-01,108.75,595.53
3,2015-01-01,1146.75,1742.27
4,2015-01-01,950.25,2692.52
...,...,...,...
199868,2024-04-20,914.61,206405164.17
199869,2024-04-20,150.18,206405314.35
199870,2024-04-20,147.78,206405462.13
199871,2024-04-20,2019.62,206407481.75


### Monthly Sales Analysis

**`UNBOUNDED PRECEDING` & `UNBOUNDED FOLLOWING`**

In [60]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) as net_revenue_rolling
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_rolling
0,2023-01,3664431.34,2759047.13
1,2023-02,4465204.57,2759047.13
2,2023-03,2244316.52,2759047.13
3,2023-04,1162796.16,2759047.13
4,2023-05,2943005.99,2759047.13
5,2023-06,2864500.03,2759047.13
6,2023-07,2337639.34,2759047.13
7,2023-08,2623919.79,2759047.13
8,2023-09,2622774.85,2759047.13
9,2023-10,2551322.61,2759047.13


In [61]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
    
)
SELECT 
    month,
    net_revenue,
    AVG(net_revenue) OVER (
        ORDER BY month
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  -- CURRENT ROW AND UNBOUNDED PRECEDING
        ) as net_revenue_rolling
FROM monthly_sales;

Unnamed: 0,month,net_revenue,net_revenue_rolling
0,2023-01,3664431.34,3664431.34
1,2023-02,4465204.57,4064817.96
2,2023-03,2244316.52,3457984.14
3,2023-04,1162796.16,2884187.15
4,2023-05,2943005.99,2895950.92
5,2023-06,2864500.03,2890709.1
6,2023-07,2337639.34,2811699.14
7,2023-08,2623919.79,2788226.72
8,2023-09,2622774.85,2769843.18
9,2023-10,2551322.61,2747991.12


In [64]:
%%sql

WITH monthly_sales AS (
    SELECT 
        TO_CHAR(orderdate, 'YYYY-MM') as month,
        SUM(quantity * netprice * exchangerate) as net_revenue
    FROM sales
    WHERE EXTRACT(YEAR FROM orderdate) = 2023
    GROUP BY month
    ORDER BY month
)
SELECT 
    month,
    net_revenue,
    LAST_VALUE(net_revenue) OVER (ORDER BY month) as last_month_revenue,
    LAST_VALUE(net_revenue) OVER (
        ORDER BY month 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) as last_month_revenue_unbound,
    NTH_VALUE(net_revenue, 3) OVER (ORDER BY month) as third_month_revenue_unbound,
    NTH_VALUE(net_revenue, 3) OVER (
        ORDER BY month 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) as third_month_revenue
FROM monthly_sales;

Unnamed: 0,month,net_revenue,last_month_revenue,last_month_revenue_unbound,third_month_revenue_unbound,third_month_revenue
0,2023-01,3664431.34,3664431.34,2928550.93,,2244316.52
1,2023-02,4465204.57,4465204.57,2928550.93,,2244316.52
2,2023-03,2244316.52,2244316.52,2928550.93,2244316.52,2244316.52
3,2023-04,1162796.16,1162796.16,2928550.93,2244316.52,2244316.52
4,2023-05,2943005.99,2943005.99,2928550.93,2244316.52,2244316.52
5,2023-06,2864500.03,2864500.03,2928550.93,2244316.52,2244316.52
6,2023-07,2337639.34,2337639.34,2928550.93,2244316.52,2244316.52
7,2023-08,2623919.79,2623919.79,2928550.93,2244316.52,2244316.52
8,2023-09,2622774.85,2622774.85,2928550.93,2244316.52,2244316.52
9,2023-10,2551322.61,2551322.61,2928550.93,2244316.52,2244316.52


#### Cumulative Revenue from First Order

**`UNBOUNDED  PRECEDING`**

1. Use the previous query and update the `rolling_prev_3_month_net_revenue` column to calculate the cumulative net revenue starting from the first row using `UNBOUNDED PRECEDING`.  
   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to calculate `total_net_revenue` for each day per customer.  
   - Define the second CTE `cohort_summary` to calculate the monthly total revenue.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the total revenue for all customers within each cohort, grouped by `cohort_year` and `year_month`. 
        - Get the order year month from `orderdate` using `DATE_TRUNC` and cast to a date with `::date`. 
   - In the main query, calculate the cumulative net revenue starting from the first row using a window function.  
        - Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)` to include all rows from the beginning up to the current row.  
        - Select `cohort_year`, `year_month`, and `cumulative_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, year_month` to ensure results are sorted chronologically by cohort and date.  

In [12]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Updated
    ) AS cumulative_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month
;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue,cumulative_net_revenue
0,2015,2015-01-01,384092.66,384092.66
1,2015,2015-02-01,706374.12,1090466.78
2,2015,2015-03-01,332961.59,1423428.37
3,2015,2015-04-01,160767.00,1584195.37
4,2015,2015-05-01,548632.63,2132828.00
...,...,...,...,...
107,2023,2023-12-01,2928550.93,33108565.51
108,2024,2024-01-01,2677498.55,2677498.55
109,2024,2024-02-01,3542322.55,6219821.10
110,2024,2024-03-01,1692854.89,7912675.99


<img src="../Resources/images/3.5_cohort_cumulative_rev.png" alt="Cohort LTV Change" width="50%">

> ⚠️ **Chart Note**: This plots only for 2015 Cohort.

#### Remaining Revenue After Each Order

**`UNBOUNDED FOLLOWING`**

1. Use the previous query and update the `cumulative_net_revenue` to calculate the total net revenue from the current month to the end using `UNBOUNDED FOLLOWING`.  

   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to calculate `total_net_revenue` for each day per customer.  
   - Define the second CTE `cohort_summary` to calculate the monthly total revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the total revenue for all customers within each cohort, grouped by `cohort_year` and `year_month`.  
        - Get the order year month from `orderdate` using `DATE_TRUNC` and cast to a date with `::date`.
   - In the main query, calculate the total net revenue from the current date to the end using a window function.  
        - Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)` to include all rows from the current row to the last month.  
        - Select `cohort_year`, `year_month`, and `remaining_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, year_month` to ensure results are sorted chronologically by cohort and month.  


In [13]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING -- Updated
    ) AS remaining_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month
;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue,remaining_net_revenue
0,2015,2015-01-01,384092.66,7370979.48
1,2015,2015-02-01,706374.12,6986886.82
2,2015,2015-03-01,332961.59,6280512.70
3,2015,2015-04-01,160767.00,5947551.11
4,2015,2015-05-01,548632.63,5786784.11
...,...,...,...,...
107,2023,2023-12-01,2928550.93,2928550.93
108,2024,2024-01-01,2677498.55,8396527.38
109,2024,2024-02-01,3542322.55,5719028.83
110,2024,2024-03-01,1692854.89,2176706.28


<img src="../Resources/images/3.5_cohort_remaining_rev.png" alt="Cohort Remaining Net Revenue" width="50%">

> ⚠️ **Chart Note**: This plots only for 2015 Cohort.