<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/7_Basic_Query_Optimization/2_Basic_Optimization.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Basic Optimization

## Overview

### 🥅 Analysis Goals
 
Improve query efficiency and reduce costs by optimizing cohort revenue tracking and lifetime value (LTV) calculations.  
- **Optimize Sales Data Aggregation:** Summarizes customer purchases, calculates total revenue, and assigns cohort years while ensuring efficient joins with `customer`.  
- **Calculate Rolling Lifetime Value:** Computes cumulative and rolling 3-month revenue per cohort to track revenue trends over time with optimized window functions.  

### 📘 Concepts Covered

- Basic query optimization

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## Query Optimization

### 📝 Notes

Query Optimizaiton

- Improves SQL performance by reducing execution time and resource usage.
- **Basic Optimization Tips**  
    - Use `INNER JOIN` instead of `LEFT JOIN` when unmatched rows aren’t needed.  
    - Filter early using `WHERE`, not `HAVING`, to reduce processed rows.  
    - Avoid `SELECT *`, select only required columns.  
    - Use `UNION` instead of `UNION ALL` when removing duplicates is acceptable.  
    - Pre-filter data before `GROUP BY` and `DISTINCT` to avoid unnecessary calculations.  
    - Replace `OR` conditions with `IN` for better index usage.  
    - Use `EXISTS` instead of `IN` for subqueries on large datasets.  
    - Ensure data types match in comparisons to prevent slow implicit conversions.  

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Query Efficiency: Optimized data retrieval methods
  - Resource Management: Efficient use of database resources
  - Performance Scaling: Handling growing data volumes
- **💡 Why It Matters**: Improves business operations and costs
    - Reduces cloud computing costs through efficient queries
    - Enables faster reporting for business decisions
    - Supports analysis of larger customer datasets
    - Allows more frequent cohort analysis updates
    - Makes revenue tracking more cost-effective
- **🎯 Common Use Cases**: 
  - Daily revenue reporting
  - Real-time customer analysis
  - Large-scale cohort tracking
  - Regular performance monitoring
- **📈 Related KPIs**: 
  - Query cost reduction
  - Report generation time
  - System resource savings
  - Analysis turnaround time    

### 📈 Analysis

- Summarizes customer purchases, calculates total revenue, and assigns cohort years while ensuring efficient joins with `customer` (from `1_Explain.ipynb`).  
- Computes cumulative and rolling 3-month revenue per cohort to track revenue trends over time with optimized window functions (from `2_View_Project.ipynb`).  

#### Simple Query Optimization

**Query Optimization**

1. Use `EXPLAIN` on the query to find ways to optimize it better (from the last example in the `1_Explain.ipynb` notebook).
    - Hash Left Join: PostgreSQL joins `customer` (104,990 rows) and `sales_data` (37,024 rows).  
    - Sequential Scan on `customer`: Scans all rows instead of using an index.  
    - Hash Aggregation on `sales`: Groups `sales` (199,873 rows) by `customerkey`, calculating revenue, orders, and cohort year.  
    - Sequential Scan on `sales`: No filtering, so the entire table is scanned.  

In [2]:
%%sql

EXPLAIN
WITH sales_data AS (
    SELECT
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        SUM(quantity * netprice * exchangerate) AS net_revenue,
        COUNT(orderkey) AS num_orders
    FROM sales
    GROUP BY customerkey
)
SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(TRIM(c.givenname), ' ', TRIM(c.surname)) AS cleaned_name,
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;


Unnamed: 0,QUERY PLAN
0,Hash Left Join (cost=9312.35..15292.71 rows=1...
1,Hash Cond: (c.customerkey = s.customerkey)
2,-> Seq Scan on customer c (cost=0.00..4129...
3,-> Hash (cost=8849.55..8849.55 rows=37024 ...
4,-> Subquery Scan on s (cost=8016.51....
5,-> HashAggregate (cost=8016.51...
6,Group Key: sales.customerkey
7,-> Seq Scan on sales (co...


<img src="../Resources/query_results/7.1_explain_1.png" alt="Query Results 1" style="width: 70%; height: auto;">

2. Add in a `WHERE` clause to the CTE to only get orders on or after '2020-01-01' to get the more recent cohorts. 
    - Use a CTE (`sales_data`) to preprocess customer revenue metrics. 
        - Extracts each customer’s cohort year using `EXTRACT(YEAR FROM MIN(orderdate))`.  
        - Aggregates total net revenue per customer using `SUM(quantity * netprice * exchangerate)`.  
        - Counts the number of orders per customer using `COUNT(orderkey)`. 
        - 🔔 Filter the data to only return orders on or after `2020-01-01'. 
    - In the main query: 
        - Uses `TRIM` within `CONCAT` to eliminate excess whitespace while maintaining a standardized name format.   
        - Uses `COALESCE` to replace null revenue and order values with `0` to avoid missing data issues.  
        - Computes*average order value by dividing `net_revenue` by `num_orders`, handling division by zero with `NULLIF`.  

In [3]:
%%sql

WITH sales_data AS (
    SELECT
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        SUM(quantity * netprice * exchangerate) AS net_revenue,
        COUNT(orderkey) AS num_orders
    FROM sales
    WHERE (orderdate) >= '2020-01-01'
    GROUP BY customerkey
)
SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(TRIM(c.givenname), ' ', TRIM(c.surname)) AS cleaned_name,
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;

Unnamed: 0,customerkey,cohort_year,cleaned_name,net_revenue,total_orders,avg_order_value
0,15,2021,Julian McGuigan,2217.41,1,2217.41
1,23,,Rose Dash,0.00,0,
2,36,,Annabelle Townsend,0.00,0,
3,120,,Jamie Hetherington,0.00,0,
4,180,2023,Gabriel Bosanquet,1984.90,2,992.45
...,...,...,...,...,...,...
104985,2099639,,Miroslav Slach,0.00,0,
104986,2099656,2023,Wilfredo Lozada,10404.68,13,800.36
104987,2099697,2022,Phillipp Maier,38.20,3,12.73
104988,2099711,,Katerina Pavlícková,0.00,0,


3. Use `EXPLAIN` on the query to view the query execution plan.

    - Hash Left Join: PostgreSQL joins `customer` (104,990 rows) and `sales_data` (36,816 rows), reducing the dataset slightly from 37,024 rows.  
    - Sequential Scan on `customer`: Scans all rows (104,990), unchanged.  
    - Hash Aggregation on `sales`: Aggregation now processes **36,816 rows** instead of 37,024, lowering computation cost.  
    - 🔔 Sequential Scan on `sales`: Row scan reduced from **199,873 to 123,339** due to the `WHERE orderdate >= '2020-01-01'` filter.  
    - 🔔 Overall Join Cost: Reduced from 9312.35..15292.71 to 8465.41..14445.76, making the query slightly more efficient. 

In [4]:
%%sql

EXPLAIN
WITH sales_data AS (
    SELECT
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        SUM(quantity * netprice * exchangerate) AS net_revenue,
        COUNT(orderkey) AS num_orders
    FROM sales
    WHERE (orderdate) >= '2020-01-01'
    GROUP BY customerkey
)
SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(TRIM(c.givenname), ' ', TRIM(c.surname)) AS cleaned_name,
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;


Unnamed: 0,QUERY PLAN
0,Hash Left Join (cost=8465.41..14445.76 rows=1...
1,Hash Cond: (c.customerkey = s.customerkey)
2,-> Seq Scan on customer c (cost=0.00..4129...
3,-> Hash (cost=8005.20..8005.20 rows=36816 ...
4,-> Subquery Scan on s (cost=7176.85....
5,-> HashAggregate (cost=7176.85...
6,Group Key: sales.customerkey
7,-> Seq Scan on sales (co...
8,Filter: (orderdate >...


<img src="../Resources/query_results/7.2_basic_optimization_1.png" alt="Query Results 1" style="width: 70%; height: auto;">

#### Complex Query Optimization

**Basic Optimization**

1. Use `EXPLAIN` on the query to find ways to optimize it better (from the last example in the `2_View_Project.ipynb` notebook).
    - Sequential Scan on `cohort_analysis`: Reads all rows without an index, leading to a full table scan.  
    - Aggregation on `cohort_analysis`: Groups data by `cohort_year` and `year_month` to calculate total revenue.  
    - SubQuery Scan (`cohort_summary` CTE): Stores aggregated cohort revenue as a temporary result set.  
    - Window Function Calculations (`rolling_ltv` CTE): Computes cumulative and rolling 3-month revenue using window functions.  
    - Sorting by `cohort_year` and `year_month`: Ensures correct ordering for rolling calculations.  
    - Final Projection (`SELECT` statement): Computes rolling LTV and average LTV for each cohort and month.  

In [5]:
%%sql

EXPLAIN
WITH cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        year_month
),

rolling_ltv AS (
    SELECT
        cohort_year,
        year_month,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        COUNT(*) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month ROWS 
            BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS months_since_start,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_revenue,
        COUNT(*) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_num_months
    FROM cohort_summary
    ORDER BY
        cohort_year,
        year_month
)

SELECT
    cohort_year,
    year_month,
    cumulative_revenue,
    cumulative_revenue / months_since_start AS rolling_avg_ltv,
    rolling_3_month_revenue / rolling_3_month_num_months AS rolling_3_month_avg_ltv --Added 
FROM rolling_ltv;

Unnamed: 0,QUERY PLAN
0,Subquery Scan on rolling_ltv (cost=9689.11..9...
1,-> WindowAgg (cost=9689.11..9846.45 rows=3...
2,-> WindowAgg (cost=9689.11..9772.41 ...
3,-> Sort (cost=9689.11..9698.37...
4,Sort Key: cohort_analysis....
5,-> HashAggregate (cost=9...
6,Group Key: cohort_an...
7,-> Subquery Scan on...
8,-> HashAggreg...
9,Group Ke...


<img src="../Resources/query_results/7.2_basic_optimization_2.png" alt="Query Results 1" style="width: 70%; height: auto;">

2. Replace `COUNT(*) OVER` for `months_since_start` with `DENSE_RANK` and `rolling_3_month_num_months` with `COUNT(total_revenue)`.
    - 🔔 `months_since_start` using `DENSE_RANK() OVER (PARTITION BY cohort_year ORDER BY year_month)`, which counts number of months since the start. 
        - `DENSE_RANK()` returns the number of distinct `year_month` values in order, which is equivalent to `months_since_start` without counting rows.
    - 🔔 `rolling_3_month_num_months` using `COUNT(total_revenue) OVER (PARTITION BY cohort_year ORDER BY year_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)`, which counts the number of months in the rolling 3-month period.
        - Use `COUNT(total_revenue)`, which avoids unnecessary row counting.

In [6]:
%%sql

WITH cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month, 
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
),

rolling_ltv AS (
    SELECT
        cohort_year,
        year_month,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        DENSE_RANK() OVER ( -- Updated
            PARTITION BY cohort_year 
            ORDER BY year_month
        ) AS months_since_start,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_revenue,
        COUNT(total_revenue) OVER ( -- Updated
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_num_months
    FROM cohort_summary
    ORDER BY
        cohort_year,
        year_month
)

SELECT
    cohort_year,
    year_month,
    cumulative_revenue,
    cumulative_revenue / months_since_start AS rolling_avg_ltv,
    rolling_3_month_revenue / rolling_3_month_num_months AS rolling_3_month_avg_ltv --Added 
FROM rolling_ltv;


Unnamed: 0,cohort_year,year_month,cumulative_revenue,rolling_avg_ltv,rolling_3_month_avg_ltv
0,2015,2015-01-01,384092.66,384092.66,384092.66
1,2015,2015-02-01,1090466.78,545233.39,545233.39
2,2015,2015-03-01,1423428.37,474476.12,474476.12
3,2015,2015-04-01,1584195.37,396048.84,400034.24
4,2015,2015-05-01,2132828.00,426565.60,347453.74
...,...,...,...,...,...
107,2023,2023-12-01,33108565.51,2759047.13,2726658.97
108,2024,2024-01-01,2677498.55,2677498.55,2677498.55
109,2024,2024-02-01,6219821.10,3109910.55,3109910.55
110,2024,2024-03-01,7912675.99,2637558.66,2637558.66


3. Remove redundant `ORDER BY` in the CTEs and move it to the main query. 
    - 🔔 Move the `ORDER BY cohort_year, year_month` to the main query. 
        - `ORDER BY cohort_year, year_month` in `rolling_ltv` is unnecessary because `ORDER BY` should only exist in the final query.

In [7]:
%%sql

WITH cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month, 
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
),

rolling_ltv AS (
    SELECT
        cohort_year,
        year_month,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        DENSE_RANK() OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
        ) AS months_since_start,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_revenue,
        COUNT(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_num_months
    FROM cohort_summary
)

SELECT
    cohort_year,
    year_month,
    cumulative_revenue,
    cumulative_revenue / months_since_start AS rolling_avg_ltv,
    rolling_3_month_revenue / rolling_3_month_num_months AS rolling_3_month_avg_ltv
FROM rolling_ltv
ORDER BY -- Updated
    cohort_year, 
    year_month;


Unnamed: 0,cohort_year,year_month,cumulative_revenue,rolling_avg_ltv,rolling_3_month_avg_ltv
0,2015,2015-01-01,384092.66,384092.66,384092.66
1,2015,2015-02-01,1090466.78,545233.39,545233.39
2,2015,2015-03-01,1423428.37,474476.12,474476.12
3,2015,2015-04-01,1584195.37,396048.84,400034.24
4,2015,2015-05-01,2132828.00,426565.60,347453.74
...,...,...,...,...,...
107,2023,2023-12-01,33108565.51,2759047.13,2726658.97
108,2024,2024-01-01,2677498.55,2677498.55,2677498.55
109,2024,2024-02-01,6219821.10,3109910.55,3109910.55
110,2024,2024-03-01,7912675.99,2637558.66,2637558.66


4. Use `EXPLAIN` on the query to view the query execution plan.

    - Subquery Scan on `rolling_ltv`: Execution cost reduced from 9689.11..9939.00 to 9689.11..9883.47, improving efficiency slightly.  
    - 🔔 Window Function Execution: Now processes DENSE_RANK() instead of `COUNT(*)`, reducing sorting overhead.  
    - Sorting Step: Cost remains 9689.11..9698.37, indicating no additional sorting overhead from the new window function.  
    - Hash Aggregation on `cohort_summary`: Maintains similar cost 9404.91..9469.69, suggesting window function change didn’t add complexity.  
    -*Sequential Scan on `sales`: Row scan remains 199,873, as no filtering was added at this step.  
    - 🔔 Overall Execution Cost: Remains similar, but using `DENSE_RANK()` instead of `COUNT(*)` reduces unnecessary row counting, improving performance on large datasets.

> ⚠️ **Chart Note**: The query results are the same.

In [8]:
%%sql

EXPLAIN
WITH cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month, 
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
),

rolling_ltv AS (
    SELECT
        cohort_year,
        year_month,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        DENSE_RANK() OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
        ) AS months_since_start,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_revenue,
        COUNT(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_num_months
    FROM cohort_summary
)

SELECT
    cohort_year,
    year_month,
    cumulative_revenue,
    cumulative_revenue / months_since_start AS rolling_avg_ltv,
    rolling_3_month_revenue / rolling_3_month_num_months AS rolling_3_month_avg_ltv
FROM rolling_ltv
ORDER BY 
    cohort_year, 
    year_month;


Unnamed: 0,QUERY PLAN
0,Subquery Scan on rolling_ltv (cost=9689.11..9...
1,-> WindowAgg (cost=9689.11..9901.98 rows=3...
2,-> WindowAgg (cost=9689.11..9827.94 ...
3,-> WindowAgg (cost=9689.11..97...
4,-> Sort (cost=9689.11..9...
5,Sort Key: cohort_ana...
6,-> HashAggregate (...
7,Group Key: coh...
8,-> Subquery S...
9,-> Hash...


<img src="../Resources/query_results/7.2_basic_optimization_4.png" alt="Query Results 1" style="width: 70%; height: auto;">

The query results from the query.

In [9]:
%%sql

WITH cohort_summary AS (
    SELECT
        cohort_year,
        DATE_TRUNC('month', orderdate)::date AS year_month, 
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY cohort_year, year_month
),

rolling_ltv AS (
    SELECT
        cohort_year,
        year_month,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        DENSE_RANK() OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
        ) AS months_since_start,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_revenue,
        COUNT(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY year_month 
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS rolling_3_month_num_months
    FROM cohort_summary
)

SELECT
    cohort_year,
    year_month,
    cumulative_revenue,
    cumulative_revenue / months_since_start AS rolling_avg_ltv,
    rolling_3_month_revenue / rolling_3_month_num_months AS rolling_3_month_avg_ltv
FROM rolling_ltv
ORDER BY 
    cohort_year, 
    year_month;


Unnamed: 0,cohort_year,year_month,cumulative_revenue,rolling_avg_ltv,rolling_3_month_avg_ltv
0,2015,2015-01-01,384092.66,384092.66,384092.66
1,2015,2015-02-01,1090466.78,545233.39,545233.39
2,2015,2015-03-01,1423428.37,474476.12,474476.12
3,2015,2015-04-01,1584195.37,396048.84,400034.24
4,2015,2015-05-01,2132828.00,426565.60,347453.74
...,...,...,...,...,...
107,2023,2023-12-01,33108565.51,2759047.13,2726658.97
108,2024,2024-01-01,2677498.55,2677498.55,2677498.55
109,2024,2024-02-01,6219821.10,3109910.55,3109910.55
110,2024,2024-03-01,7912675.99,2637558.66,2637558.66


<img src="../Resources/images/5.2_cohort_avg_and_rolling_3_month_ltv.png" alt="Cohort Avg and Rolling 3 Month LTV" width="50%">

> ⚠️ **Chart Note**: For 2023 cohort.