<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/3_Windows_Functions/4_Lag_Lead.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Lag / Lead

### 🥅 Analysis Goals

Explore cohort lifetime value (LTV) trends to understand customer value changes over time, year-over-year patterns, and shifts between consecutive cohorts.

- **Compare LTV to First Cohort:** Measure how each cohort's average LTV compares to the first cohort to identify long-term value trends.  
- **Year-over-Year LTV Changes:** Analyze changes in average LTV between consecutive cohorts to track year-over-year shifts in customer value.  
- **LTV Changes Between Cohorts:** Evaluate how the current cohort's average LTV compares to the next cohort to detect evolving trends.  

### 📘 Concepts Covered

- `FIRST_VALUE`
- `LAG`
- `LEAD`

---

In [11]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---
## FIRST VALUE

### 📝 Notes

`FIRST_VALUE`

- **FIRST_VALUE**: Returns the first value in an ordered partition of data.
- Syntax:
  ```sql
  SELECT
    FIRST_VALUE(column_name) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
    ) AS window_column_alias
  FROM table_name;
  ```
- Retrieve the earliest value within a group or window, such as the first purchase date or initial value in a time series.


### 📈 Analysis

- Compare the average lifetime value (LTV) of each cohort to the first cohort to measure changes in customer value over time.  
   - Identifies trends in customer value, such as growth or decline, across different cohorts.  
   - Provides insights into the success of customer acquisition and retention strategies over the years.  

#### Compare LTV to First Cohort

**`FIRST_VALUE`**

1. Get the `cohort_year` and the total revenue for each user.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer.  
   - Group by `customerkey` to ensure the revenue and cohort year are calculated per user.  
   - Calculate the total revenue for each customer using `SUM(quantity * netprice * exchangerate)`.  
   - Select `cohort_year`, `customerkey`, and the total revenue (`total_customer_net_revenue`) to display the results.  

In [30]:
%%sql

SELECT 
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
    customerkey,
    SUM(quantity * netprice * exchangerate) AS total_customer_net_revenue
FROM sales
GROUP BY 
    customerkey

Unnamed: 0,cohort_year,customerkey,total_customer_net_revenue
0,2018,2044589,2470.73
1,2021,1603477,136.62
2,2017,876049,2601.13
3,2024,1469222,5278.54
4,2018,2089398,98.39
...,...,...,...
49482,2019,853617,903.31
49483,2016,1573639,6973.42
49484,2022,1355936,149.99
49485,2024,967453,5.40


2. Create a CTE to calculate the cohort year for each customer and return all results in the main query.  
   - 🔔 Define a CTE `cohort_analysis` to extract the cohort year for each customer.  
      - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer.  
      - Group by `customerkey` to ensure the revenue and cohort year are calculated per user.  
      - Calculate the total revenue for each customer using `SUM(quantity * netprice * exchangerate)`.  
   - 🔔 In the main query, use `SELECT * FROM cohort_analysis` to return all the results from the CTE.  

In [3]:
%%sql
-- Put query into a CTE
WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_customer_net_revenue
    FROM sales
    GROUP BY 
        customerkey
)

-- Added
SELECT *
FROM cohort_analysis

Unnamed: 0,cohort_year,customerkey,total_customer_net_revenue
0,2018,2044589,2470.73
1,2021,1603477,136.62
2,2017,876049,2601.13
3,2024,1469222,5278.54
4,2018,2089398,98.39
...,...,...,...
49482,2019,853617,903.31
49483,2016,1573639,6973.42
49484,2022,1355936,149.99
49485,2024,967453,5.40


3. Create another CTE (`cohort_totals`) that calculates the cohort's average LTV and select those columns from `cohort_totals` in the main query.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the CTE results by `customerkey` to compute these metrics at the customer level.  
   - 🔔 Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)`.  
       - Group the CTE results by `cohort_year` to summarize the data at the cohort level.  
   - 🔔 In the main query, select the `cohort_year` and its `avg_ltv` from `cohort_totals`.   

In [31]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

-- Added
cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

-- Updated to get cohort_totals
SELECT 
    cohort_year,
    avg_ltv
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv
0,2015,5271.59
1,2016,5404.92
2,2017,5403.08
3,2018,4896.64
4,2019,4731.95
5,2020,3933.32
6,2021,3943.33
7,2022,3315.52
8,2023,2543.18
9,2024,2037.55


4. Get the first cohort's average LTV using `FIRST_VALUE` window function.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the CTE results by `customerkey` to compute these metrics at the customer level.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)`.  
       - Group the CTE results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, calculate the first cohort's average LTV using `FIRST_VALUE()`.  
       - 🔔 Use `FIRST_VALUE(avg_ltv) OVER (ORDER BY cohort_year)` to return the `avg_ltv` of the earliest cohort year.  
       - Select `cohort_year`, `avg_ltv`, and the calculated `first_cohort_ltv` for the output.  

In [32]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

SELECT 
    cohort_year,
    avg_ltv,
    FIRST_VALUE(avg_ltv) OVER (ORDER BY cohort_year) AS first_cohort_ltv -- Added 
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv,first_cohort_ltv
0,2015,5271.59,5271.59
1,2016,5404.92,5271.59
2,2017,5403.08,5271.59
3,2018,4896.64,5271.59
4,2019,4731.95,5271.59
5,2020,3933.32,5271.59
6,2021,3943.33,5271.59
7,2022,3315.52,5271.59
8,2023,2543.18,5271.59
9,2024,2037.55,5271.59


5. Calculate the changes in the average LTVs between the current cohort and the first cohort's average LTV using the `FIRST_VALUE` window function.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, calculate the first cohort's average LTV and the change in LTV for each cohort.  
       - Use `FIRST_VALUE(avg_ltv) OVER (ORDER BY cohort_year)` to get the `avg_ltv` of the earliest cohort year, naming it `first_cohort_ltv`.  
       - 🔔 Calculate the LTV change as `avg_ltv - FIRST_VALUE(avg_ltv) OVER (ORDER BY cohort_year)` and name it `ltv_change_first`.  
       - Select `cohort_year`, `avg_ltv`, `first_cohort_ltv`, and `ltv_change_first` for the final output.  

In [33]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

SELECT 
    cohort_year,
    avg_ltv,
    FIRST_VALUE(avg_ltv) OVER (ORDER BY cohort_year) AS first_cohort_ltv,
    avg_ltv - FIRST_VALUE(avg_ltv) OVER (ORDER BY cohort_year) AS ltv_change_first -- Added
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv,first_cohort_ltv,ltv_change
0,2015,5271.59,5271.59,0.0
1,2016,5404.92,5271.59,133.34
2,2017,5403.08,5271.59,131.5
3,2018,4896.64,5271.59,-374.95
4,2019,4731.95,5271.59,-539.64
5,2020,3933.32,5271.59,-1338.26
6,2021,3943.33,5271.59,-1328.26
7,2022,3315.52,5271.59,-1956.07
8,2023,2543.18,5271.59,-2728.41
9,2024,2037.55,5271.59,-3234.03


---
## LAG

### 📝 Notes

`LAG`

- **LAG**: Returns the value of a column from a specified number of rows before the current row in a partition.
- Syntax:
  ```sql
  SELECT
    LAG(column_name, offset, default_value) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
    ) AS window_column_alias
  FROM table_name;
  ```
- **Parameters**:
  - `offset` (optional, default: `1`): How many rows back to look.
  - `default_value` (optional): Value to return if there’s no preceding row.
- Compare current and previous values, such as tracking changes in sales or stock prices.

### 📈 Analysis

- Calculate the changes in the average lifetime value (LTV) between the current cohort and the previous cohort's LTV.  
   - Highlights year-over-year changes in customer value, revealing potential growth or declines in customer behavior.  
   - Provides insights into how recent cohorts compare to their immediate predecessors, helping assess the impact of short-term strategies.  

#### Year-over-Year LTV Changes

**`LAG`**

1. Use the same query from before but remove the `first_cohort_ltv` and `ltv_change_first` columns.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query: 
       - Select only the `cohort_year` and `avg_ltv` columns.  
       - 🔔 The `first_cohort_ltv` and `ltv_change_first` columns from the previous query are removed to simplify the output.  

In [34]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

-- Removed first_cohort_ltv and ltv_change_first columns
SELECT 
    cohort_year,
    avg_ltv
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv
0,2015,5271.59
1,2016,5404.92
2,2017,5403.08
3,2018,4896.64
4,2019,4731.95
5,2020,3933.32
6,2021,3943.33
7,2022,3315.52
8,2023,2543.18
9,2024,2037.55


2. Use `LAG` to get the previous cohort's average LTV.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, calculate the previous cohort's average LTV using `LAG`.  
       - 🔔 Use `LAG(avg_ltv) OVER (ORDER BY cohort_year)` to fetch the `avg_ltv` of the cohort from the previous year, naming it `prev_cohort_ltv`.  
       - Order the `LAG` function by `cohort_year` to ensure the values are sequential by cohort year.  

In [35]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

SELECT 
    cohort_year,
    avg_ltv,
    LAG(avg_ltv) OVER (ORDER BY cohort_year) AS prev_cohort_ltv -- Added
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv,prev_cohort_ltv
0,2015,5271.59,
1,2016,5404.92,5271.59
2,2017,5403.08,5404.92
3,2018,4896.64,5403.08
4,2019,4731.95,4896.64
5,2020,3933.32,4731.95
6,2021,3943.33,3933.32
7,2022,3315.52,3943.33
8,2023,2543.18,3315.52
9,2024,2037.55,2543.18


3. Calculate the changes in the LTVs between the current cohort and the previous cohort's LTV using the `LAG` window function.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, calculate the previous cohort's LTV and the change in LTV using `LAG`.  
       - Use `LAG(avg_ltv) OVER (ORDER BY cohort_year)` to fetch the previous cohort's average LTV, naming it `prev_cohort_ltv`.  
       - 🔔 Calculate the change in LTV as `avg_ltv - LAG(avg_ltv) OVER (ORDER BY cohort_year)` and name it `ltv_change_prev`.  

In [36]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

SELECT 
    cohort_year,
    avg_ltv,
    LAG(avg_ltv) OVER (ORDER BY cohort_year) AS prev_cohort_ltv,  
    avg_ltv - LAG(avg_ltv) OVER (ORDER BY cohort_year) AS ltv_change_prev -- Added
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv,prev_cohort_ltv,ltv_change
0,2015,5271.59,,
1,2016,5404.92,5271.59,133.34
2,2017,5403.08,5404.92,-1.84
3,2018,4896.64,5403.08,-506.44
4,2019,4731.95,4896.64,-164.69
5,2020,3933.32,4731.95,-798.62
6,2021,3943.33,3933.32,10.0
7,2022,3315.52,3943.33,-627.81
8,2023,2543.18,3315.52,-772.34
9,2024,2037.55,2543.18,-505.63


---
## LEAD

### 📝 Notes

`LEAD`

- **LEAD**: Returns the value of a column from a specified number of rows after the current row in a partition.
- Syntax:
  ```sql
  SELECT
    LEAD(column_name, offset, default_value) OVER(
        PARTITION BY partition_expression
        ORDER BY order_expression
    ) AS window_column_alias
  FROM table_name;
  ```
- **Parameters**:
  - `offset` (optional, default: `1`): How many rows forward to look.
  - `default_value` (optional): Value to return if there’s no subsequent row.
- Compare current and future values, such as forecasting or tracking upcoming events. 

### 📈 Analysis

- Calculate the changes in the average lifetime value (LTV) between the current cohort and the next cohort's LTV.  
   - Highlights shifts in customer value as cohorts evolve, identifying trends in decreasing or increasing value.  
   - Provides insights into whether newer cohorts are more or less valuable compared to the preceding ones, helping refine acquisition strategies.  

#### LTV Changes Between Cohorts

**`LEAD`**

1. Use the same query from before but remove the `prev_cohort_ltv` and `ltv_change_prev` columns. 
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, select only the `cohort_year` and `avg_ltv` columns.  
       - 🔔 The `prev_cohort_ltv` and `ltv_change_prev` columns from the previous query are removed to simplify the output.  

In [37]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

-- Removed prev_cohort_ltv and ltv_change_prev columns
SELECT 
    cohort_year,
    avg_ltv
FROM cohort_totals

Unnamed: 0,cohort_year,avg_ltv
0,2015,5271.59
1,2016,5404.92
2,2017,5403.08
3,2018,4896.64
4,2019,4731.95
5,2020,3933.32
6,2021,3943.33
7,2022,3315.52
8,2023,2543.18
9,2024,2037.55


2. Use `LEAD` to get the next cohort's average LTV.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, calculate the next cohort's LTV using `LEAD`.  
       - 🔔 Use `LEAD(avg_ltv) OVER (ORDER BY cohort_year)` to fetch the average LTV of the next cohort, naming it `next_cohort_ltv`.  

In [38]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

SELECT
    cohort_year,
    avg_ltv,
    LEAD(avg_ltv) OVER (ORDER BY cohort_year) AS next_cohort_ltv -- Added
FROM cohort_totals;

Unnamed: 0,cohort_year,avg_ltv,next_cohort_ltv
0,2015,5271.59,5404.92
1,2016,5404.92,5403.08
2,2017,5403.08,4896.64
3,2018,4896.64,4731.95
4,2019,4731.95,3933.32
5,2020,3933.32,3943.33
6,2021,3943.33,3315.52
7,2022,3315.52,2543.18
8,2023,2543.18,2037.55
9,2024,2037.55,


3. Calculate the changes in the LTVs between the current cohort and the next cohort's LTV using the `LEAD` window function.  
   - Define a CTE `cohort_analysis` to calculate the cohort year and total net revenue per customer.  
       - Use `EXTRACT(YEAR FROM MIN(orderdate))` to determine the cohort year for each customer.  
       - Calculate `total_net_revenue` as the sum of `quantity * netprice * exchangerate`.  
       - Group the results by `customerkey` to calculate these metrics for each customer.  
   - Define a CTE `cohort_totals` to calculate the average LTV for each cohort year.  
       - Calculate `avg_ltv` as `SUM(total_net_revenue) / COUNT(DISTINCT customerkey)` to determine the average lifetime value of each cohort.  
       - Group the results by `cohort_year` to summarize the data at the cohort level.  
   - In the main query, calculate the next cohort's LTV and the change in LTV using `LEAD`.  
       - Use `LEAD(avg_ltv) OVER (ORDER BY cohort_year)` to fetch the average LTV of the next cohort, naming it `next_cohort_ltv`.  
       - 🔔 Calculate the change in LTV as `avg_ltv - LEAD(avg_ltv) OVER (ORDER BY cohort_year)` and name it `ltv_change_next`.  

In [40]:
%%sql

WITH cohort_analysis AS (
    SELECT 
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        customerkey,
        SUM(quantity * netprice * exchangerate) AS total_net_revenue
    FROM sales
    GROUP BY 
        customerkey
),

cohort_totals AS (
    SELECT
        cohort_year,
        SUM(total_net_revenue) / COUNT(DISTINCT customerkey) AS avg_ltv   
    FROM cohort_analysis
    GROUP BY
        cohort_year
)

SELECT
    cohort_year,
    avg_ltv,
    LEAD(avg_ltv) OVER (ORDER BY cohort_year) AS next_cohort_ltv,
    avg_ltv - LEAD(avg_ltv) OVER (ORDER BY cohort_year) AS ltv_change_next -- Added
FROM cohort_totals;

Unnamed: 0,cohort_year,avg_ltv,next_cohort_ltv,ltv_change_next
0,2015,5271.59,5404.92,-133.34
1,2016,5404.92,5403.08,1.84
2,2017,5403.08,4896.64,506.44
3,2018,4896.64,4731.95,164.69
4,2019,4731.95,3933.32,798.62
5,2020,3933.32,3943.33,-10.0
6,2021,3943.33,3315.52,627.81
7,2022,3315.52,2543.18,772.34
8,2023,2543.18,2037.55,505.63
9,2024,2037.55,,


### 💡 Why analyze LTV Changes Between Cohorts?

- **Reverse Perspective:** The "LTV Changes Between Cohorts" is essentially the same as the "Year-over-Year LTV Changes," but viewed in reverse, comparing each cohort to the next instead of the previous.  
- **Reason for Reverse Comparison:** 
    - This perspective helps identify how newer cohorts are performing compared to their predecessors, providing insights into declining or improving trends as cohorts evolve over time.  
    - By looking forward instead of backward, businesses can focus on emerging patterns and adjust acquisition or retention strategies for future cohorts.  