<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/5_Views/1_View_Intro.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Views Intro

## Overview

### 🥅 Analysis Goals

Find the cohort daily revenue to create a view to use in future analysis to uncover daily trends, short-term fluctuations.
- **Cohort Revenue Insights:**  Track cumulative revenue up to each date to measure cohort growth over time.  

### 📘 Concepts Covered

- Create views
- Use views

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## Views

### 📝 Notes

`CREATE VIEW`

- **Why Use Views in PostgreSQL?**  
  - Simplifies complex queries by storing them as reusable, named objects.  
  - Ensures consistency and readability when multiple queries rely on the same logic.  
  - Enhances security by restricting access to specific rows/columns.  
  - Improves maintainability by centralizing changes to the query logic.

- **Syntax:**  
    ```sql
    CREATE VIEW view_name AS
    SELECT
        column1,
        column2,
        column3
    FROM table_name
    WHERE condition;
    ```
    - `CREATE VIEW view_name AS`: Creates a new view with the specified name.
    - `SELECT`: Defines the query whose results will be stored in the view.
    - `WHERE`: (Optional) Filters data included in the view.◊

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Cohort Revenue: Total sales generated by a group of customers who started at the same time
  - Daily Revenue Trends: Patterns in revenue performance over daily periods
  - Revenue Accumulation: Progressive total of sales over time
- **💡 Why It Matters**: Enables efficient analysis of cohort performance
    - Simplifies complex revenue calculations for repeated use
    - Maintains consistency in cohort analysis across queries
    - Provides standardized way to track daily revenue patterns
    - Computes a 7-day rolling average LTV for shorter timeframes to analyze recent changes in customer value.
    - Provides insights into overall customer value trends and short-term customer activity for cohorts.
- **🎯 Common Use Cases**: 
  - Cohort revenue analysis
  - Daily trend tracking
  - Revenue pattern identification
- **📈 Related KPIs**: 
  - Daily revenue
  - Cohort growth metrics
  - Cumulative revenue trends

### 📈 Analysis

- Calculates the average lifetime value (LTV) for each cohort based on cumulative revenue and user count.


#### Average and 7-Day Rolling LTV

**`CREATE VIEWS`**

1. Get the `cohort_year` for each customer.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually. 

In [4]:
%%sql

SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


2. Create a CTE `cohort` to calculate the cohort year for each customer and then in the main query LEFT JOIN `cohort` on `customerkey` to the `sales` table to get the `orderdate` and `total_net_revenue`.
   - 🔔 Define a CTE `cohort` to calculate the cohort year for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
        - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually. 
   - 🔔 Main query joins this cohort information back to `sales` data using `LEFT JOIN`.
        - Calculates total revenue per customer per day using `quantity * netprice * exchangerate`.
        - Groups results by `customerkey`, their `cohort_year` and `orderdate` to get daily purchase totals.

In [5]:
%%sql

WITH cohort AS (
	SELECT 
	    customerkey,
	    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year
	FROM sales
	GROUP BY 
	    customerkey
	)

-- Added 	
SELECT 
    s.customerkey,
    c.cohort_year,
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_net_revenue
FROM sales s
LEFT JOIN cohort c ON c.customerkey = s.customerkey
GROUP BY 
    s.customerkey, 
    c.cohort_year, 
    s.orderdate

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,878212,2022,2022-08-15,442.37
1,156085,2017,2017-02-19,23232.81
2,1552546,2018,2018-11-24,4570.56
3,327894,2019,2023-04-28,955.64
4,272527,2017,2017-06-17,74.96
...,...,...,...,...
83094,1842409,2023,2023-09-08,47.65
83095,296331,2023,2024-02-28,2150.56
83096,1333456,2022,2024-02-10,4631.55
83097,838930,2019,2024-02-21,5734.63


3. Create a view in DBeaver using `CREATE VIEW`.  

   - 🔔 Use `CREATE VIEW cohort_analysis AS` to define a view named `cohort_analysis`.  
   - Define a CTE `cohort` to calculate the cohort year◊ for each customer.  
        - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
        - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually. 
   - Main query joins this cohort information back to `sales` data using `LEFT JOIN`.
        - Calculates total revenue per customer per day using `quantity * netprice * exchangerate`.
        - Groups results by `customerkey`, their `cohort_year` and `orderdate` to get daily purchase totals.

In [4]:
%%sql

CREATE VIEW cohort_analysis AS
WITH cohort AS (
	SELECT 
	    customerkey,
	    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year
	FROM sales
	GROUP BY 
	    customerkey
	)

-- Added 	
SELECT 
    s.customerkey,
    c.cohort_year,
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_net_revenue
FROM sales s
LEFT JOIN cohort c ON c.customerkey = s.customerkey
GROUP BY 
    s.customerkey, 
    c.cohort_year, 
    s.orderdate

dbeaver output:

![Create View](../Resources/query_results/5.1_dbeaver_create_view.png)

3. To see the view you created:
    1. Click the `Views` folder
    2. Refresh the `Views` using `F5`.
    3. Then go to the left side in the Database Navigator. 
    4. Double click the view you created named `cohort_analysis`
    5. Then go to the `Data` tab (if it doesn't go there by default).



![See cohort_analysis View](../Resources/query_results/5.1_dbeaver_views.gif)

To see the view you created using Collab:

In [6]:
%%sql

SELECT *
FROM cohort_analysis

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,878212,2022,2022-08-15,442.37
1,156085,2017,2017-02-19,23232.81
2,1552546,2018,2018-11-24,4570.56
3,327894,2019,2023-04-28,955.64
4,272527,2017,2017-06-17,74.96
...,...,...,...,...
83094,1842409,2023,2023-09-08,47.65
83095,296331,2023,2024-02-28,2150.56
83096,1333456,2022,2024-02-10,4631.55
83097,838930,2019,2024-02-21,5734.63


4. Use the view and calculate the total net revenue.  

   - Query the `cohort_analysis` view to retrieve `cohort_year`, `orderdate`, and `total_net_revenue`.  
   - 🔔 Use `SUM(total_net_revenue)` to calculate the total revenue for all customers within each cohort for a specific day.  
   - 🔔 Group by `cohort_year` and `orderdate` to ensure the total revenue is aggregated at the cohort and daily levels.  
   - 🔔 Select `cohort_year`, `orderdate`, and `total_revenue` for the final output.  

In [7]:
%%sql

SELECT
    cohort_year,
    orderdate,
    SUM(total_net_revenue) AS total_revenue
FROM cohort_analysis
GROUP BY 
    cohort_year, 
    orderdate;

Unnamed: 0,cohort_year,orderdate,total_revenue
0,2022,2023-09-06,3537.66
1,2020,2020-12-08,5160.80
2,2017,2023-06-06,9155.73
3,2022,2022-04-05,1167.08
4,2015,2020-10-16,477.11
...,...,...,...
12950,2016,2022-07-25,4307.61
12951,2019,2024-01-28,4284.73
12952,2018,2023-12-19,4385.82
12953,2017,2018-02-02,1874.40


<img src="../Resources/images/5.1_monthly_rev.png" alt="Continent" width="50%">

> ⚠️ **Chart Note**: This plots only for 2023 Cohort for the first 40 days (2023-01-01 to 2023-02-09).