<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/5_Views/1_View_Intro.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Views Intro

## Overview

### 🥅 Analysis Goals

Analyze cohort revenue and lifetime value (LTV) to uncover daily trends, short-term fluctuations, future potential, and long-term customer value patterns.
- **Cohort Revenue Insights:**  
  - Track cumulative revenue up to each date to measure cohort growth over time.  
  - Calculate remaining cumulative revenue from each order date to analyze future revenue potential.  
  - Evaluate average LTV for each cohort using cumulative revenue while incorporating a 7-day rolling average.

### 📘 Concepts Covered

- Create views
- Use views

---
## Views

### 📝 Notes

`CREATE VIEW`

- **Why Use Views in PostgreSQL?**  
  - Simplifies complex queries by storing them as reusable, named objects.  
  - Ensures consistency and readability when multiple queries rely on the same logic.  
  - Enhances security by restricting access to specific rows/columns.  
  - Improves maintainability by centralizing changes to the query logic.

- **Syntax:**  
    ```sql
    CREATE VIEW view_name AS
    SELECT
        column1,
        column2,
        column3
    FROM table_name
    WHERE condition;
    ```
    - `CREATE VIEW view_name AS`: Creates a new view with the specified name.
    - `SELECT`: Defines the query whose results will be stored in the view.
    - `WHERE`: (Optional) Filters data included in the view.◊


### 💻 Final Result

- Calculates the average lifetime value (LTV) for each cohort based on cumulative revenue and user count.
  - Computes a 7-day rolling average LTV for shorter timeframes to analyze recent changes in customer value.
  - Provides insights into overall customer value trends and short-term customer activity for cohorts.

#### Average and 7-Day Rolling LTV

**`CREATE VIEWS`**

1. Get the `cohort_year` for each customer and the total revenue for each day (previously called `cohort_analysis` CTE).  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually.  
   - Include `orderdate` in the `GROUP BY` clause to calculate daily revenue for each customer.  
   - Use `SUM(quantity * netprice * exchangerate)` to calculate the total net revenue for each day.  
   - Select `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue` for the final output.  

In [None]:
SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate

![Query Results 1](../Resources/query_results/view_query_results_1.png)

2. Create a view in pgAdmin using `CREATE VIEW`.  

   - Use `CREATE VIEW cohort_analysis AS` to define a view named `cohort_analysis`.  
   - Select `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue` to include these columns in the view.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Calculate the `total_net_revenue` for each customer on each day using `SUM(quantity * netprice * exchangerate)`.  
   - Group by `customerkey` and `orderdate` to ensure the calculations are aggregated correctly.  ◊

In [None]:
CREATE VIEW cohort_analysis AS
SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate;

![Query Results 2](../Resources/query_results/view_query_results_2.png)

3. To see the view you created:
    1. Go to the left side and refresh the `Views`.
    2. Then right click the new view you created named `cohort_analysis`
    3. Then go to -> `View/Edit Data` -> `All Rows`.

![See cohort_analysis View](../Resources/query_results/view_query_results_3.gif)

4. Use the view and calculate the total net revenue (this replaces the `cohort_summary` CTE).  

   - Query the `cohort_analysis` view to retrieve `cohort_year`, `orderdate`, and `total_net_revenue`.  
   - Use `SUM(total_net_revenue)` to calculate the total revenue for all customers within each cohort for a specific day.  
   - Group by `cohort_year` and `orderdate` to ensure the total revenue is aggregated at the cohort and daily levels.  
   - Select `cohort_year`, `orderdate`, and `total_revenue` for the final output.  

In [None]:
SELECT
    cohort_year,
    orderdate,
    SUM(total_net_revenue) AS total_revenue
FROM cohort_analysis
GROUP BY 
    cohort_year, 
    orderdate;

![Query Results 1](../Resources/query_results/view_query_results_4.png)

5. Put the previous query into a CTE named `cohort_summary` and get the cumulative summary using a window function for the cohort year and order date.  

   - Define a CTE `cohort_summary` to calculate the total daily revenue for each cohort.  
        - Use `SUM(total_net_revenue)` to aggregate the total revenue per cohort and day.  
        - Group the CTE by `cohort_year` and `orderdate` to summarize the data at the cohort and daily levels.  
   - In the main query, use `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)` to calculate the cumulative revenue for each cohort up to the current date.  
        - Apply `PARTITION BY cohort_year` to ensure the cumulative calculation is done separately for each cohort.  
        - Order by `orderdate` within each cohort to maintain chronological order.  
        - Select `cohort_year`, `orderdate`, and `cumulative_revenue` for the final output.  
        - Use `ORDER BY cohort_year, orderdate` to display results in a sorted and logical order.  

In [None]:
WITH cohort_summary AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        orderdate
)
    
SELECT
    cohort_year,
    orderdate,
    SUM(total_revenue) OVER (
        PARTITION BY cohort_year
        ORDER BY orderdate 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Changed 
    ) AS cumulative_revenue
FROM cohort_summary
ORDER BY 
    cohort_year, 
    orderdate

![Query Results 5](../Resources/query_results/view_query_results_5.png)

6. Put the previous main query into a CTE called `rolling_ltv` and select all of the results in the main query.  

   - Define a CTE `cohort_summary` to calculate the total daily revenue for each cohort.  
        - Use `SUM(total_net_revenue)` to aggregate the total revenue per day, grouped by `cohort_year` and `orderdate`.  
   - Add another CTE `rolling_ltv` to calculate the cumulative revenue for each cohort.  
        - Use a window function `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)` to compute the cumulative revenue up to the current date for each cohort.  
        - Order the cumulative results by `cohort_year` and `orderdate` within the `rolling_ltv` CTE.  
   - In the main query, use `SELECT * FROM rolling_ltv` to display all results, including `cohort_year`, `orderdate`, and `cumulative_revenue`.  

In [None]:
WITH cohort_summary AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Changed 
        ) AS cumulative_revenue
    FROM cohort_summary
    ORDER BY
        cohort_year, 
        orderdate
)

SELECT *
FROM rolling_ltv

![Query Results 5](../Resources/query_results/view_query_results_5.png)

7. Add `COUNT` to count the number of days since the cohort’s first order and modify the main query to call specific columns.  

   - Define a CTE `cohort_summary` to calculate the total daily revenue for each cohort, grouping by `cohort_year` and `orderdate`.  
   - In the second CTE `rolling_ltv`:  
     - `cumulative_revenue` using `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`, which tracks the total accumulated revenue per cohort over time.  
     - `days_since_start` using `COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`, which counts the number of days since the cohort's first recorded order.  
   - The main query selects `cumulative_revenue` and `days_since_start` for each `cohort_year` and `orderdate`.  

In [None]:
WITH cohort_summary AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        COUNT(*) OVER ( -- Added 
            PARTITION BY cohort_year 
            ORDER BY orderdate ROWS 
            BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS days_since_start
    FROM cohort_summary
    ORDER BY
        cohort_year, 
        orderdate
)

SELECT
    cohort_year,
    orderdate,
    cumulative_revenue,
    days_since_start
FROM rolling_ltv;

8. Calculate the rolling average LTV using: `cumulative_revenue / days_since_start`.  

    - Define a CTE `cohort_summary` to calculate the total daily revenue for each cohort, grouping by `cohort_year` and `orderdate`.  
   - In the second CTE `rolling_ltv`:  
     - `cumulative_revenue` using `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`, which tracks the total accumulated revenue per cohort over time.  
     - `days_since_start` using `COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`, which counts the number of days since the cohort's first recorded order.  
   - The main query selects `cumulative_revenue` and `days_since_start` for each `cohort_year` and `orderdate`.  
   - In the main query, calculate the rolling average LTV by dividing `cumulative_revenue` by `days_since_start`.  
        - Select `cohort_year`, `orderdate`, `cumulative_revenue`, `days_since_start`, and the calculated `rolling_avg_ltv`.  

In [None]:
WITH cohort_summary AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        COUNT(*) OVER ( -- Added 
            PARTITION BY cohort_year 
            ORDER BY orderdate ROWS 
            BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS days_since_start
    FROM cohort_summary
    ORDER BY
        cohort_year, 
        orderdate
)

SELECT
    cohort_year,
    orderdate,
    cumulative_revenue,
    days_since_start,
    cumulative_revenue / days_since_start AS rolling_avg_ltv
FROM rolling_ltv;

9. Add two new columns in the `rolling_ltv` CTE to get the rolling 7-day revenue.  

   - Define a CTE `cohort_summary` to calculate the total daily revenue for each cohort, grouping by `cohort_year` and `orderdate`.  
   - Update the CTE `rolling_ltv` to calculate the following cumulative and rolling metrics for each cohort:  
        - Compute `cumulative_revenue` using `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`.  
        - Compute `days_since_start` using `COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`.  
        - Add `rolling_7_day_revenue` using `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)` to calculate the total revenue for the current and previous 6 days.  
        - Add `rolling_7_day_num_days` using `COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)` to calculate the number of days for the same 7-day window.  
        - Order results by `cohort_year` and `orderdate` to maintain chronological order.  
   - In the main query, include:  
        - `cumulative_revenue` and `days_since_start`.  
        - `rolling_avg_ltv`, calculated as `cumulative_revenue / days_since_start`.  
        - `rolling_7_day_revenue` and `rolling_7_day_num_days` to display the rolling metrics.  

In [None]:
WITH cohort_summary AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        COUNT(*) OVER (
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS days_since_start,
        SUM(total_revenue) OVER ( -- Added 
            PARTITION BY cohort_year 
            ORDER BY orderdate
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS rolling_7_day_revenue,
        COUNT(*) OVER ( -- Added 
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS rolling_7_day_num_days
    FROM cohort_summary
    ORDER BY
        cohort_year,
        orderdate
)

SELECT
    cohort_year,
    orderdate,
    cumulative_revenue,
    days_since_start,
    cumulative_revenue / days_since_start AS rolling_avg_ltv,
    rolling_7_day_revenue,
    rolling_7_day_num_days
FROM rolling_ltv;

10. Calculate the rolling 7-day average LTV using: `rolling_7_day_revenue / rolling_7_day_user_count`. Remove the `user_count` column.  
   - Define a CTE `cohort_summary` to calculate daily total revenue per cohort, grouping by `cohort_year` and `orderdate`.  
   - In the second CTE `rolling_ltv` to compute the following metrics per cohort:  
     - `cumulative_revenue` using `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`, which tracks the total revenue accrued by the cohort over time.  
     - `days_since_start` using `COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)`, which counts the number of days since the cohort’s first order.  
     - `rolling_7_day_revenue` using `SUM(total_revenue) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)`, which captures total revenue for the last 7 days.  
     - `rolling_7_day_num_days` using `COUNT(*) OVER (PARTITION BY cohort_year ORDER BY orderdate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)`, which counts the number of days in the rolling 7-day period.  
   - In the main query select `cohort_year`, `orderdate`, `cumulative_revenue`, `rolling_avg_ltv`, and `rolling_7_avg_ltv`:  
        - Include `cohort_year` and `cumulative_revenue` for reference.  
        - Calculate `rolling_avg_ltv` as `cumulative_revenue / days_since_start`.  
        - Calculate `rolling_7_avg_ltv` as `rolling_7_day_revenue / rolling_7_day_num_days`.  
        - Remove the `days_since_start`, `rolling_7_day_revenue`, `rolling_7_day_num_days` columns from the final output.  


In [None]:
WITH cohort_summary AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_net_revenue) AS total_revenue
    FROM cohort_analysis
    GROUP BY 
        cohort_year, 
        orderdate
),

rolling_ltv AS (
    SELECT
        cohort_year,
        orderdate,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS cumulative_revenue,
        COUNT(*) OVER ( -- Added 
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS user_count,
        SUM(total_revenue) OVER (
            PARTITION BY cohort_year 
            ORDER BY orderdate
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS rolling_7_day_revenue,
        COUNT(*) OVER ( -- Added 
            PARTITION BY cohort_year 
            ORDER BY orderdate 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS rolling_7_day_user_count
    FROM cohort_summary
    ORDER BY
        cohort_year,
        orderdate
)

SELECT
    cohort_year,
    orderdate,
    cumulative_revenue,
    cumulative_revenue / user_count AS rolling_avg_ltv,
    rolling_7_day_revenue / rolling_7_day_user_count AS rolling_7_avg_ltv
FROM rolling_ltv;

![Query Results 5](../Resources/query_results/view_query_results_10.png)