<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/3_Windows_Functions/5_Frame_Clause.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Frame Clause

## Overview

### 🥅 Analysis Goals

Analyze cohort revenue and lifetime value (LTV) to uncover monthly trends, short-term fluctuations, future potential, and long-term customer value patterns.

- **Monthly Revenue Trends:** Calculate monthly net revenue for each cohort to track individual day performance without cumulative or rolling effects.  
- **Short-Term Revenue Patterns:** Compute rolling 7-day revenue sums to smooth daily fluctuations and reveal weekly purchasing behaviors.  
- **Projected Short-Term Revenue:** Summarize net revenue for the next 3 days by cohort to identify patterns and assist in short-term forecasting.  

### 📘 Concepts Covered

- `CURRENT ROW`
- `N PRECEDING` 
- `N FOLLOWING` 
- `UNBOUNDED`
    - `UNBOUNDED PRECEDING`
    - `UNBOUNDED FOLLOWING` 

---

In [14]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---

## CURRENT ROW

### 📝 Notes

`CURRENT ROW`

- **CURRENT ROW**: Refers to the current row in a window frame during a query execution.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS window_column_alias
  FROM table_name;
  ```
- Used to specify the row being processed as the start or end of a frame. Often combined with aggregations to apply calculations specific to the current row.
    ```

### 💻 Final Result

- Calculates the exact net revenue for each cohort on a daily basis without rolling or cumulative sums.
    - Helps isolate daily performance trends for each cohort without being influenced by revenue on other days.

#### Daily Revenue by Cohort

**`CURRENT ROW`**

1. Get the `cohort_year` for each customer and the total revenue for each day.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually.  
   - Include `orderdate` in the `GROUP BY` clause to calculate daily revenue for each customer.  
   - Use `SUM(quantity * netprice * exchangerate)` to calculate the total net revenue for each day.  
   - Select `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue` for the final output.  

In [28]:
%%sql

SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate


Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


2. Create a CTE to calculate the cohort year for each day and return all results in the main query.  
   - 🔔 Define a CTE `cohort_analysis` to calculate the cohort year and total daily revenue for each customer.  
      - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order date.  
      - Group the CTE by `customerkey` and `orderdate` to calculate daily net revenue per customer.  
      - Use `SUM(quantity * netprice * exchangerate)` to compute the total net revenue for each day.  
   - 🔔 In the main query, use `SELECT * FROM cohort_analysis` to return all calculated results, including `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue`.  

In [24]:
%%sql

-- Put query into a CTE
WITH cohort_analysis AS (
    SELECT 
        customerkey, 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
)

-- Added a SELECT statement to query the CTE
SELECT *
FROM cohort_analysis;

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


3. Add a new CTE to get the total daily revenue for each cohort and return all results in the main query.  

   - Define a CTE `cohort_analysis` to calculate the cohort year and total daily revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order date.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to compute the total net revenue for each customer per day using `SUM(quantity * netprice * exchangerate)`.  
   - 🔔 Add a second CTE `cohort_summary` to aggregate monthly revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to calculate the monthly total revenue for all customers within each cohort. 
        - Get the order year month from `orderdate` using `DATE_TRUNC`.
        - Group `cohort_summary` by `cohort_year` and `year_month` to summarize the data at the cohort and daily level.  
   - 🔔 In the main query, select all columns from `cohort_summary` to display `cohort_year`, `year_month`, and `total_revenue`.  
        - Use `ORDER BY cohort_year, year_month` to sort the results by cohort year and the order year-month in ascending order.  

In [2]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
), 

-- Added 
cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate) AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        year_month
)

-- Updated 
SELECT
    *
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month;

Unnamed: 0,cohort_year,year_month,total_revenue
0,2015,2015-01-01 00:00:00-08:00,384092.66
1,2015,2015-02-01 00:00:00-08:00,706374.12
2,2015,2015-03-01 00:00:00-08:00,332961.59
3,2015,2015-04-01 00:00:00-07:00,160767.00
4,2015,2015-05-01 00:00:00-07:00,548632.63
...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,2928550.93
108,2024,2024-01-01 00:00:00-08:00,2677498.55
109,2024,2024-02-01 00:00:00-08:00,3542322.55
110,2024,2024-03-01 00:00:00-08:00,1692854.89


4. In the main query, use `CURRENT ROW` to get the daily net revenue for each cohort.  

   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
        - Aggregate `total_net_revenue` in `cohort_analysis` by grouping on `customerkey` and `orderdate`.  
   - Define a second CTE `cohort_summary` to calculate the monthly total revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to calculate the monthly total revenue for all customers within each cohort. 
        - Get the order year month from `orderdate` using `DATE_TRUNC`.
        - Group `cohort_summary` by `cohort_year` and `year_month` to summarize the data at the cohort and daily level.  
   - In the main query, use a window function to calculate the monthly net revenue for each cohort.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - Select `cohort_year`, `year_month`, and `monthly_net_revenue` for the output.  
        - Order the results by `cohort_year` and `year_month` to display them in chronological order for each cohort.  

In [5]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),
cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate) AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        year_month
)

SELECT
    cohort_year,
    year_month,
    -- Added
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year -- Separate aggregation for each cohort
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month;


Unnamed: 0,cohort_year,year_month,monthly_net_revenue
0,2015,2015-01-01 00:00:00-08:00,384092.66
1,2015,2015-02-01 00:00:00-08:00,706374.12
2,2015,2015-03-01 00:00:00-08:00,332961.59
3,2015,2015-04-01 00:00:00-07:00,160767.00
4,2015,2015-05-01 00:00:00-07:00,548632.63
...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,2928550.93
108,2024,2024-01-01 00:00:00-08:00,2677498.55
109,2024,2024-02-01 00:00:00-08:00,3542322.55
110,2024,2024-03-01 00:00:00-08:00,1692854.89


---

## N PRECEDING

`PRECEDING`

- **N PRECEDING**: Refers to `N` rows before the current row in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN N PRECEDING AND CURRENT ROW
    ) AS window_column_alias
  FROM table_name;
  ```
- Enables calculations involving the current row and up to `N` preceding rows, such as moving averages or cumulative sums for a fixed number of rows.


### 💻 Final Result

- Computes the rolling 30-day (average length of a month) sum of net revenue for each cohort, smoothing fluctuations in daily revenue.
    - Identifies mid-term trends in net revenue, such as monthly purchasing patterns.

#### 30-Day Rolling Revenue by Cohort

**`PRECEEDING`**

1. Use the previous query and add a new windows function to calculate the rolling 30-day net revenue using `PRECEDING`.  
   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
        - Aggregate `total_net_revenue` in `cohort_analysis` by grouping on `customerkey` and `orderdate`.  
   - Define the second CTE `cohort_summary` to calculate the monthly total revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the monthly total revenue for each cohort, grouped by `cohort_year` and `year_month`.  
   - In the main query, calculate the rolling 30-day net revenue for each cohort using a window function.  
        - Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)` to include revenue from the current day and the previous 29 days.  
        - Select `cohort_year`, `year_month`, and `rolling_30_day_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, year_month` to ensure results are sorted chronologically by cohort and date.  

In [9]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate) AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue,
    -- Added
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) AS rolling_30_day_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue,rolling_30_day_net_revenue
0,2015,2015-01-01 00:00:00-08:00,384092.66,384092.66
1,2015,2015-02-01 00:00:00-08:00,706374.12,1090466.78
2,2015,2015-03-01 00:00:00-08:00,332961.59,1423428.37
3,2015,2015-04-01 00:00:00-07:00,160767.00,1584195.37
4,2015,2015-05-01 00:00:00-07:00,548632.63,2132828.00
...,...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,2928550.93,33108565.51
108,2024,2024-01-01 00:00:00-08:00,2677498.55,2677498.55
109,2024,2024-02-01 00:00:00-08:00,3542322.55,6219821.10
110,2024,2024-03-01 00:00:00-08:00,1692854.89,7912675.99


---
## N FOLLOWING

### 📝 Notes

`N FOLLOWING`

- **N FOLLOWING**: Refers to `N` rows after the current row in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN CURRENT ROW AND N FOLLOWING
    ) AS window_column_alias
  FROM table_name;
  ```
- Useful for calculating aggregations involving the current row and a specified number of subsequent rows, such as projecting future totals or averages.

### 💻 Final Result

- Summarizes the next 3 months of net revenue for each cohort and month, projecting mid-term revenue performance.
  - Useful for forecasting and identifying patterns in purchasing activity after a given month.

#### Future 3-Month Revenue by Cohort

**`FOLLOWING`**

1. Use the previous query and update`rolling_30_day_net_revenue` to calculate the next 3-month net revenue using `FOLLOWING`.  
   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to calculate the daily `total_net_revenue` for each customer.  
   - Define the second CTE `cohort_summary` to calculate the total revenue for each cohort by day.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the monthly total revenue for each cohort, grouped by `cohort_year` and `year_month`.  
   - In the main query, calculate the next 3-month net revenue for each cohort using a window function.  
        - Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND CURRENT ROW)` to get the revenue for the current row.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING)` to include revenue from the current day and the next 3 months.  
        - Select `cohort_year`, `year_month`, and `rolling_3_day_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, orderdate` to ensure results are sorted chronologically by cohort and month.  

In [17]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate) AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS monthly_net_revenue,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month 
        ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING
    ) AS rolling_3_month_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month
;

Unnamed: 0,cohort_year,year_month,monthly_net_revenue,rolling_3_month_net_revenue
0,2015,2015-01-01 00:00:00-08:00,384092.66,1584195.37
1,2015,2015-02-01 00:00:00-08:00,706374.12,1748735.34
2,2015,2015-03-01 00:00:00-08:00,332961.59,1790925.19
3,2015,2015-04-01 00:00:00-07:00,160767.00,2093339.73
4,2015,2015-05-01 00:00:00-07:00,548632.63,2651111.35
...,...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,2928550.93,2928550.93
108,2024,2024-01-01 00:00:00-08:00,2677498.55,8396527.38
109,2024,2024-02-01 00:00:00-08:00,3542322.55,5719028.83
110,2024,2024-03-01 00:00:00-08:00,1692854.89,2176706.28


---
## UNBOUNDED

### 📝 Notes

`UNBOUNDED PRECEDING`

- **UNBOUNDED PRECEDING**: Refers to the first row of the partition or dataset in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS window_column_alias
  FROM table_name;
  ```
- Commonly used for cumulative calculations starting from the beginning of a partition, such as running totals or cumulative averages.

`UNBOUNDED FOLLOWING`

- **UNBOUNDED FOLLOWING**: Refers to the last row of the partition or dataset in a window frame.
- Syntax:
  ```sql
  SELECT
    column_name,
    SUM(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    ) AS window_column_alias
  FROM table_name;
  ```
- Often used to aggregate values from the current row to the end of the partition, such as totals or counts of future data.


### 💻 Final Result

  - Calculates the total accumulated net revenue up to each month for every cohort.
    - Tracks long-term cohort growth and overall net revenue contributions over time.
  - Measures remaining cumulative net revenue from each order month  to the end of the cohort’s activity.
    - Assists in understanding future net revenue potential from a given point in time.

#### Cumulative Revenue from First Order

**`UNBOUNDED  PRECEDING`**

1. Use the previous query and update the `rolling_7_day_net_revenue` to calculate the cumulative net revenue starting from the first row using `UNBOUNDED PRECEDING`.  
   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to calculate `total_net_revenue` for each day per customer.  
   - Define the second CTE `cohort_summary` to calculate the monthly total revenue.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the total revenue for all customers within each cohort, grouped by `cohort_year` and `year_month`.  
   - In the main query, calculate the cumulative net revenue starting from the first row using a window function.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)` to include all rows from the beginning up to the current row.  
        - Select `cohort_year`, `year_month`, and `cumulative_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, year_month` to ensure results are sorted chronologically by cohort and date.  

In [20]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate) AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Updated
    ) AS cumulative_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month
;

Unnamed: 0,cohort_year,year_month,cumulative_net_revenue
0,2015,2015-01-01 00:00:00-08:00,384092.66
1,2015,2015-02-01 00:00:00-08:00,1090466.78
2,2015,2015-03-01 00:00:00-08:00,1423428.37
3,2015,2015-04-01 00:00:00-07:00,1584195.37
4,2015,2015-05-01 00:00:00-07:00,2132828.00
...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,33108565.51
108,2024,2024-01-01 00:00:00-08:00,2677498.55
109,2024,2024-02-01 00:00:00-08:00,6219821.10
110,2024,2024-03-01 00:00:00-08:00,7912675.99


#### Remaining Revenue After Each Order

**`UNBOUNDED FOLLOWING`**

1. Use the previous query and update the `cumulative_net_revenue` to calculate the total net revenue from the current month to the end using `UNBOUNDED FOLLOWING`.  

   - Define the CTE `cohort_analysis` to calculate the cohort year and daily net revenue for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer based on their earliest order.  
        - Group `cohort_analysis` by `customerkey` and `orderdate` to calculate `total_net_revenue` for each day per customer.  
   - Define the second CTE `cohort_summary` to calculate the monthly total revenue for each cohort.  
        - Use `SUM(total_net_revenue)` in `cohort_summary` to aggregate the total revenue for all customers within each cohort, grouped by `cohort_year` and `year_month`.  
   - In the main query, calculate the total net revenue from the current date to the end using a window function.  
        - 🔔 Apply `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)` to include all rows from the current row to the last month.  
        - Select `cohort_year`, `year_month`, and `remaining_net_revenue` for the output.  
        - Use `ORDER BY cohort_year, year_month` to ensure results are sorted chronologically by cohort and month.  


In [21]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
        orderdate,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey, 
        orderdate
),

cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate) AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
)

SELECT
    cohort_year,
    year_month,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY year_month
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING -- Updated
    ) AS remaining_net_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    year_month
;

Unnamed: 0,cohort_year,year_month,remaining_net_revenue
0,2015,2015-01-01 00:00:00-08:00,7370979.48
1,2015,2015-02-01 00:00:00-08:00,6986886.82
2,2015,2015-03-01 00:00:00-08:00,6280512.70
3,2015,2015-04-01 00:00:00-07:00,5947551.11
4,2015,2015-05-01 00:00:00-07:00,5786784.11
...,...,...,...
107,2023,2023-12-01 00:00:00-08:00,2928550.93
108,2024,2024-01-01 00:00:00-08:00,8396527.38
109,2024,2024-02-01 00:00:00-08:00,5719028.83
110,2024,2024-03-01 00:00:00-08:00,2176706.28
