<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/5_Views/1_View_Intro.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Views Intro

## Overview

### 🥅 Analysis Goals

Find the cohort daily revenue to create a view to use in future analysis to uncover daily trends, short-term fluctuations.
- **Cohort Revenue Insights:** Calculate daily net revenue for each cohort.  

### 📘 Concepts Covered

- Create views
- Use views

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## Views

### 📝 Notes

`CREATE VIEW`

- **Why Use Views in PostgreSQL?**  
  - Simplifies complex queries by storing them as reusable, named objects.  
  - Ensures consistency and readability when multiple queries rely on the same logic.  
  - Enhances security by restricting access to specific rows/columns.  
  - Improves maintainability by centralizing changes to the query logic.

- **Syntax:**  
    ```sql
    CREATE VIEW view_name AS
    SELECT
        column1,
        column2,
        column3
    FROM table_name
    WHERE condition;
    ```
    - `CREATE VIEW view_name AS`: Creates a new view with the specified name.
    - `SELECT`: Defines the query whose results will be stored in the view.
    - `WHERE`: (Optional) Filters data included in the view.◊

### 🔑 Key Concepts
- **📊 Business Terms**: 
  - Cohort Revenue: Total sales generated by a group of customers who started at the same time
  - Daily Revenue Trends: Patterns in revenue performance over daily periods
- **💡 Why It Matters**: Enables efficient analysis of cohort performance
    - Simplifies complex revenue calculations for repeated use
    - Maintains consistency in cohort analysis across queries
    - Provides standardized way to track daily revenue patterns
    - Provides insights into overall customer value trends and short-term customer activity for cohorts.
- **🎯 Common Use Cases**: 
  - Cohort revenue analysis
  - Daily trend tracking
  - Revenue pattern identification
- **📈 Related KPIs**: 
  - Daily revenue
  - Cohort growth metrics

### 📈 Analysis

- Calculates the daily net revenue for each cohort.

#### Daily Net Revenue

**`CREATE VIEWS`**

1. Get the daily net revenue and number of orders for each customer.
   - Select `customerkey`, `orderdate`, and `total_net_revenue`
   - Use `GROUP BY` to group by `customerkey` and `orderdate`.
   - Use `SUM` to calculate the total net revenue for each customer per day.
   - Use `COUNT` to calculate the number of orders for each customer per day.

In [7]:
%%sql

SELECT 
    s.customerkey,
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_net_revenue,
    COUNT(s.orderkey) AS num_orders
FROM sales s
GROUP BY 
    s.customerkey,
    s.orderdate


Unnamed: 0,customerkey,orderdate,total_net_revenue,num_orders
0,1506769,2022-03-04,996.79,2
1,909157,2021-11-16,1565.80,2
2,2047462,2020-12-09,34.06,1
3,1933480,2021-11-10,45.90,2
4,1701958,2017-10-05,5144.64,1
...,...,...,...,...
83094,1273185,2023-09-16,3452.05,3
83095,420797,2022-12-29,278.90,1
83096,642485,2023-03-08,574.31,3
83097,863441,2023-03-23,475.84,3


2. Return the `first_purchase_date` and `cohort_year` for each customer using Windows Functions.
    - Select `customerkey`, `orderdate`, `total_net_revenue`, `first_purchase_date`, and `cohort_year`. 
    - Use `GROUP BY` to group by `customerkey` and `orderdate`.
    - Use `SUM` to calculate the total net revenue for each customer per day.
    - Use `COUNT` to calculate the number of orders for each customer per day.
    - 🔔 Get the earliest order date for each customer using `MIN(s.orderdate) OVER (PARTITION BY s.customerkey) AS first_purchase_date`.
        - `MIN(s.orderdate)` gets the earliest date
        - `PARTITION BY s.customerkey` performs this calculation separately for each customer
    - 🔔Get the cohort year for each customer using `EXTRACT(YEAR FROM MIN(s.orderdate) OVER (PARTITION BY s.customerkey)) AS cohort_year`.
        - Uses same `MIN` and `PARTITION BY` logic as above
        - `EXTRACT(YEAR FROM ...)` converts the date to just the year


In [8]:
%%sql

SELECT 
    s.customerkey,
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_net_revenue,
    COUNT(s.orderkey) AS num_orders,
    MIN(s.orderdate) OVER (PARTITION BY s.customerkey) AS first_purchase_date, -- Added
    EXTRACT(YEAR FROM MIN(s.orderdate) OVER (PARTITION BY s.customerkey)) AS cohort_year -- Added
FROM sales s
GROUP BY 
    s.customerkey,
    s.orderdate


Unnamed: 0,customerkey,orderdate,total_net_revenue,num_orders,first_purchase_date,cohort_year
0,15,2021-03-08,2217.41,1,2021-03-08,2021
1,180,2018-07-28,525.31,1,2018-07-28,2018
2,180,2023-08-28,1984.90,2,2018-07-28,2018
3,185,2019-06-01,1395.52,1,2019-06-01,2019
4,243,2016-05-19,287.67,1,2016-05-19,2016
...,...,...,...,...,...,...
83094,2099697,2022-09-13,38.20,3,2022-09-13,2022
83095,2099711,2016-08-13,2067.75,1,2016-08-13,2016
83096,2099711,2017-08-14,3940.92,1,2016-08-13,2016
83097,2099743,2022-03-17,469.62,2,2022-03-17,2022


3. Left join the `customer` table to the `sales` table on `customerkey` to get the `countryfull`, `age`, `givenname`, and `surname` for each customer.
    - 🔔 Select `customerkey`, `orderdate`, `total_net_revenue`, `first_purchase_date`, and `cohort_year`, `countryfull`, `age`, `givenname`, and `surname`.
    - Use `SUM` to calculate the total net revenue for each customer per day.
    - Use `COUNT` to calculate the number of orders for each customer per day.
    - Get the earliest order date for each customer using `MIN(s.orderdate) OVER (PARTITION BY s.customerkey) AS first_purchase_date`.
    - Get the cohort year for each customer using `EXTRACT(YEAR FROM MIN(s.orderdate) OVER (PARTITION BY s.customerkey)) AS cohort_year`.
    - 🔔 Use `GROUP BY` to group by `customerkey`, `countryfull`, `age`, `givenname`, and `surname`.
    - 🔔 Use `LEFT JOIN` to left join the `customer` table to the `sales` table on `customerkey`.


In [9]:
%%sql

SELECT 
    s.customerkey,
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_net_revenue,
    COUNT(s.orderkey) AS num_orders,
    MIN(s.orderdate) OVER (PARTITION BY s.customerkey) AS first_purchase_date,
    EXTRACT(YEAR FROM MIN(s.orderdate) OVER (PARTITION BY s.customerkey)) AS cohort_year,
    c.countryfull, -- Added 
    c.age, -- Added 
    c.givenname, -- Added 
    c.surname -- Added 
FROM sales s
LEFT JOIN customer c ON c.customerkey = s.customerkey
GROUP BY 
    s.customerkey, 
    s.orderdate,
    c.countryfull, -- Added 
    c.age, -- Added 
    c.givenname, -- Added 
    c.surname -- Added 
    

Unnamed: 0,customerkey,orderdate,total_net_revenue,num_orders,first_purchase_date,cohort_year,countryfull,age,givenname,surname
0,15,2021-03-08,2217.41,1,2021-03-08,2021,Australia,55,Julian,McGuigan
1,180,2018-07-28,525.31,1,2018-07-28,2018,Australia,65,Gabriel,Bosanquet
2,180,2023-08-28,1984.90,2,2018-07-28,2018,Australia,65,Gabriel,Bosanquet
3,185,2019-06-01,1395.52,1,2019-06-01,2019,Australia,40,Gabrielle,Castella
4,243,2016-05-19,287.67,1,2016-05-19,2016,Australia,66,Maya,Atherton
...,...,...,...,...,...,...,...,...,...,...
83094,2099697,2022-09-13,38.20,3,2022-09-13,2022,United States,54,Phillipp,Maier
83095,2099711,2016-08-13,2067.75,1,2016-08-13,2016,United States,80,Katerina,Pavlícková
83096,2099711,2017-08-14,3940.92,1,2016-08-13,2016,United States,80,Katerina,Pavlícková
83097,2099743,2022-03-17,469.62,2,2022-03-17,2022,United States,21,Luciana,Almonte


4. Create a view in DBeaver using `CREATE VIEW`.  
    - Select `customerkey`, `orderdate`, `total_net_revenue`, `first_purchase_date`, and `cohort_year`, `countryfull`, `age`, `givenname`, and `surname`.
    - Use `SUM` to calculate the total net revenue for each customer per day.
    - Use `COUNT` to calculate the number of orders for each customer per day.
    - Get the earliest order date for each customer using `MIN(s.orderdate) OVER (PARTITION BY s.customerkey) AS first_purchase_date`.
    - Get the cohort year for each customer using `EXTRACT(YEAR FROM MIN(s.orderdate) OVER (PARTITION BY s.customerkey)) AS cohort_year`.
    - Use `GROUP BY` to group by `customerkey`, `countryfull`, `age`, `givenname`, and `surname`.
    - Use `LEFT JOIN` to left join the `customer` table to the `sales` table on `customerkey`.

In [None]:
%%sql 

CREATE VIEW cohort_analysis AS -- Create a view named cohort_analysis
SELECT 
    s.customerkey,
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_net_revenue,
    COUNT(s.orderkey) AS num_orders,
    MIN(s.orderdate) OVER (PARTITION BY s.customerkey) AS first_purchase_date,
    EXTRACT(YEAR FROM MIN(s.orderdate) OVER (PARTITION BY s.customerkey)) AS cohort_year,
    c.countryfull,
    c.age,
    c.givenname,
    c.surname
FROM sales s
LEFT JOIN customer c ON c.customerkey = s.customerkey
GROUP BY 
    s.customerkey, 
    s.orderdate,
    c.countryfull,
    c.age,
    c.givenname,
    c.surname
    


dbeaver output:

![Create View](../Resources/query_results/5.1_dbeaver_create_view.png)

5. To see the view you created:
    1. Click the `Views` folder
    2. Refresh the `Views` using `F5`.
    3. Then go to the left side in the Database Navigator. 
    4. Double click the view you created named `cohort_analysis`
    5. Then go to the `Data` tab (if it doesn't go there by default).



![See cohort_analysis View](../Resources/query_results/5.1_dbeaver_views.gif)

To see the view you created using Collab:

In [10]:
%%sql

SELECT *
FROM cohort_analysis

Unnamed: 0,customerkey,orderdate,total_net_revenue,num_orders,first_purchase_date,cohort_year,countryfull,age,givenname,surname
0,15,2021-03-08,2217.41,1,2021-03-08,2021,Australia,55,Julian,McGuigan
1,180,2018-07-28,525.31,1,2018-07-28,2018,Australia,65,Gabriel,Bosanquet
2,180,2023-08-28,1984.90,2,2018-07-28,2018,Australia,65,Gabriel,Bosanquet
3,185,2019-06-01,1395.52,1,2019-06-01,2019,Australia,40,Gabrielle,Castella
4,243,2016-05-19,287.67,1,2016-05-19,2016,Australia,66,Maya,Atherton
...,...,...,...,...,...,...,...,...,...,...
83094,2099697,2022-09-13,38.20,3,2022-09-13,2022,United States,54,Phillipp,Maier
83095,2099711,2016-08-13,2067.75,1,2016-08-13,2016,United States,80,Katerina,Pavlícková
83096,2099711,2017-08-14,3940.92,1,2016-08-13,2016,United States,80,Katerina,Pavlícková
83097,2099743,2022-03-17,469.62,2,2022-03-17,2022,United States,21,Luciana,Almonte


6. Use the view and calculate the total net revenue.  

   - Query the `cohort_analysis` view to retrieve `cohort_year`, `orderdate`, and `total_net_revenue`.  
   - 🔔 Use `SUM(total_net_revenue)` to calculate the total revenue for all customers within each cohort for a specific day.  
   - 🔔 Group by `cohort_year` and `orderdate` to ensure the total revenue is aggregated at the cohort and daily levels.  
   - 🔔 Select `cohort_year`, `orderdate`, and `total_revenue` for the final output.  

In [11]:
%%sql

SELECT
    cohort_year,
    orderdate,
    SUM(total_net_revenue) AS total_revenue
FROM cohort_analysis
GROUP BY 
    cohort_year, 
    orderdate;

Unnamed: 0,cohort_year,orderdate,total_revenue
0,2022,2023-09-06,3537.66
1,2020,2020-12-08,5160.80
2,2022,2022-04-05,1167.08
3,2017,2023-06-06,9155.73
4,2015,2020-10-16,477.11
...,...,...,...
12950,2016,2022-07-25,4307.61
12951,2019,2024-01-28,4284.73
12952,2017,2018-02-02,1874.40
12953,2018,2023-12-19,4385.82


<img src="../Resources/images/5.1_monthly_rev.png" alt="Continent" width="50%">

> ⚠️ **Chart Note**: This plots only for 2023 Cohort for the first 40 days (2023-01-01 to 2023-02-09).