<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/5_Views/1_View_Intro.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Views Intro

## Overview

### 🥅 Analysis Goals

Find the cohort daily revenue to create a view to use in future analysis to uncover daily trends, short-term fluctuations.
- **Cohort Revenue Insights:**  
  - Track cumulative revenue up to each date to measure cohort growth over time.  

### 📘 Concepts Covered

- Create views
- Use views

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## Views

### 📝 Notes

`CREATE VIEW`

- **Why Use Views in PostgreSQL?**  
  - Simplifies complex queries by storing them as reusable, named objects.  
  - Ensures consistency and readability when multiple queries rely on the same logic.  
  - Enhances security by restricting access to specific rows/columns.  
  - Improves maintainability by centralizing changes to the query logic.

- **Syntax:**  
    ```sql
    CREATE VIEW view_name AS
    SELECT
        column1,
        column2,
        column3
    FROM table_name
    WHERE condition;
    ```
    - `CREATE VIEW view_name AS`: Creates a new view with the specified name.
    - `SELECT`: Defines the query whose results will be stored in the view.
    - `WHERE`: (Optional) Filters data included in the view.◊


### 💻 Final Result

- Calculates the average lifetime value (LTV) for each cohort based on cumulative revenue and user count.
  - Computes a 7-day rolling average LTV for shorter timeframes to analyze recent changes in customer value.
  - Provides insights into overall customer value trends and short-term customer activity for cohorts.

#### Average and 7-Day Rolling LTV

**`CREATE VIEWS`**

1. Get the `cohort_year` for each customer and the total revenue for each day (previously called `cohort_analysis` CTE).  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Include `customerkey` in the `GROUP BY` clause to ensure the cohort year is assigned to each customer individually.  
   - Include `orderdate` in the `GROUP BY` clause to calculate daily revenue for each customer.  
   - Use `SUM(quantity * netprice * exchangerate)` to calculate the total net revenue for each day.  
   - Select `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue` for the final output.  

In [2]:
%%sql

SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


2. Create a view in pgAdmin using `CREATE VIEW`.  

   - 🔔 Use `CREATE VIEW cohort_analysis AS` to define a view named `cohort_analysis`.  
   - Select `customerkey`, `cohort_year`, `orderdate`, and `total_net_revenue` to include these columns in the view.  
   - Use `EXTRACT(YEAR FROM MIN(orderdate))` to calculate the cohort year for each customer based on their earliest order date.  
   - Calculate the `total_net_revenue` for each customer on each day using `SUM(quantity * netprice * exchangerate)`.  
   - Group by `customerkey` and `orderdate` to ensure the calculations are aggregated correctly.  ◊

In [4]:
%%sql

CREATE VIEW cohort_analysis AS
SELECT 
    customerkey,
    EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
    orderdate,
    SUM(quantity * netprice * exchangerate) AS total_net_revenue
FROM sales
GROUP BY 
    customerkey, 
    orderdate;

pgAdmin output:

![Query Results 2](../Resources/query_results/5_view_2.png)

3. To see the view you created:
    1. Go to the left side and refresh the `Views`.
    2. Then right click the new view you created named `cohort_analysis`
    3. Then go to -> `View/Edit Data` -> `All Rows`.

![See cohort_analysis View](../Resources/query_results/5_view_3.gif)

To see the view you created using Collab:

In [6]:
%%sql

SELECT *
FROM cohort_analysis

Unnamed: 0,customerkey,cohort_year,orderdate,total_net_revenue
0,1506769,2022,2022-03-04,996.79
1,909157,2021,2021-11-16,1565.80
2,2047462,2020,2020-12-09,34.06
3,1933480,2021,2021-11-10,45.90
4,1701958,2017,2017-10-05,5144.64
...,...,...,...,...
83094,1273185,2023,2023-09-16,3452.05
83095,420797,2022,2022-12-29,278.90
83096,642485,2023,2023-03-08,574.31
83097,863441,2023,2023-03-23,475.84


4. Use the view and calculate the total net revenue (this replaces the `cohort_summary` CTE).  

   - Query the `cohort_analysis` view to retrieve `cohort_year`, `orderdate`, and `total_net_revenue`.  
   - 🔔 Use `SUM(total_net_revenue)` to calculate the total revenue for all customers within each cohort for a specific day.  
   - 🔔 Group by `cohort_year` and `orderdate` to ensure the total revenue is aggregated at the cohort and daily levels.  
   - 🔔 Select `cohort_year`, `orderdate`, and `total_revenue` for the final output.  

In [7]:
%%sql

SELECT
    cohort_year,
    orderdate,
    SUM(total_net_revenue) AS total_revenue
FROM cohort_analysis
GROUP BY 
    cohort_year, 
    orderdate;

Unnamed: 0,cohort_year,orderdate,total_revenue
0,2022,2022-12-13,143707.31
1,2024,2024-03-15,63214.36
2,2021,2021-04-02,2564.55
3,2018,2018-09-28,58668.28
4,2020,2020-12-08,6414.19
...,...,...,...
3289,2023,2023-10-15,10264.14
3290,2015,2015-11-22,319.27
3291,2016,2016-08-11,46262.33
3292,2023,2023-02-09,121453.06
