<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/1_Pivot_With_Case_Statements/2_Sum_Aggregation.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Conditional Aggregation

**Product focused**

## Overview

### 🥅 Analysis Goals

Perform an exploratory data analysis (EDA) on product categories and their net revenue from the sales table to uncover general trends and understand the dataset. Specifically:

- **Total net revenue in 2023 and 2022**: Compare yearly revenue trends to identify overall growth or decline.
- **Net revenue by product categories in 2023 and 2022**: Explore which categories contribute most to revenue across two years.
- **Categorize net revenue as low or high**: Identify general patterns in revenue distribution for product performance.

### 📘 Concepts Covered

- `SUM` Review
- `SUM` with `CASE WHEN`
- `BETWEEN` with `DATE`

---

In [2]:
import sys
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## SUM Review

### 📝 Notes

`SUM`

- **SUM** adds up all numeric values in a specified column, excluding NULL values.

- Syntax:

  ```sql
  SUM(column_name)
  ```

- Example:

  ```sql
  SELECT SUM(order_amount) AS total_revenue
  FROM orders;
  ```

### 💻 Final Result

- Find the total net revenue by day in 2023.
- Calculate the total net revenue by category in 2022 and 2023. Compare yearly revenue trends to identify overall growth or decline.

#### Total Net Revenue by Day in 2023

**`SUM`**

1. Find the net revenue by orderdate for 2023 orders.

    - Use `SUM(quantity * netprice * exchangerate)` to calculate the net revenue for each day.
    - Filter orders to include only dates in 2023 using `WHERE orderdate BETWEEN '2023-01-01' AND '2023-12-31'`.
    - Group data by `orderdate` to calculate daily revenue.
    - Sort the results by `orderdate` in chronological order.

In [3]:
%%sql

SELECT
    s.orderdate,
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue -- Added
FROM
    sales s
WHERE
    s.orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY   
    s.orderdate
ORDER BY
    s.orderdate

Unnamed: 0,orderdate,net_revenue
0,2023-01-01,30140.80
1,2023-01-02,107847.49
2,2023-01-03,192655.60
3,2023-01-04,189451.71
4,2023-01-05,216573.23
...,...,...
359,2023-12-27,141981.34
360,2023-12-28,138772.19
361,2023-12-29,85913.44
362,2023-12-30,165917.02


#### Total Net Revenue by Product Category in 2023 and 2022

**`SUM`**

1. Find the total net revenue by the product category for 2023 orders.

    - Use `SUM(quantity * netprice * exchangerate)` to calculate net revenue for each product category.
    - 🔔 Join the `sales` table with the `product` table on `productkey` to access `categoryname`.
    - Filter orders to include only dates in 2023 using `WHERE orderdate BETWEEN '2023-01-01' AND '2023-12-31'`.
    - 🔔 Group data by `categoryname` to calculate revenue by category.
    - 🔔 Sort results alphabetically by `categoryname`.

In [4]:
%%sql

SELECT
    p.categoryname AS category_name, -- Added
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey -- Added
WHERE
    s.orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
    p.categoryname -- Update
ORDER BY
    p.categoryname -- Update

Unnamed: 0,category_name,net_revenue
0,Audio,688690.18
1,Cameras and camcorders,1983546.29
2,Cell phones,6002147.63
3,Computers,11650867.21
4,Games and Toys,270374.96
5,Home Appliances,5919992.87
6,"Music, Movies and Audio Books",2180768.13
7,TV and Video,4412178.23


2. Find the total net revenue by the product category for 2022 orders.

    - Use `SUM(quantity * netprice * exchangerate)` to calculate net revenue for each product category.
    - Join the `sales` table with the `product` table on `productkey` to access `categoryname`.
    - 🔔 Filter orders to include only dates in 2022 using `WHERE orderdate BETWEEN '2022-01-01' AND '2022-12-31'`.
    - Group data by `categoryname` to calculate revenue by category.
    - Sort results alphabetically by `categoryname`.

In [5]:
%%sql

SELECT
    p.categoryname AS category_name,
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
WHERE
    s.orderdate::date BETWEEN '2022-01-01' AND '2022-12-31' -- Updated
GROUP BY
    p.categoryname
ORDER BY
    p.categoryname

Unnamed: 0,category_name,net_revenue
0,Audio,766938.21
1,Cameras and camcorders,2382532.56
2,Cell phones,8119665.07
3,Computers,17862213.49
4,Games and Toys,316127.3
5,Home Appliances,6612446.68
6,"Music, Movies and Audio Books",2989297.28
7,TV and Video,5815336.61


---
## SUM with CASE WHEN

### 📝 Notes

`SUM(CASE WHEN)`

- **Pivot with SUM (using `CASE WHEN` statements)** enables pivoting data by summing values based on conditional logic.

- Syntax:

  ```sql
  SUM(CASE WHEN condition THEN column ELSE 0 END) AS alias
  ```

- Example:

  ```sql
  SELECT 
    SUM(CASE WHEN region = 'North' THEN sales END) AS north_sales,
    SUM(CASE WHEN region = 'South' THEN sales END) AS south_sales
  FROM sales_data;
  ```

### 💻 Final Result

- Compare total net revenue of products by category ordered in 2023 and 2022. Explore which categories contribute most to revenue across two years. 
- Add a flag to indicate whether the net revenue is a high or low amount. This lets us quickly identify general patterns in revenue distribution for product performance.

#### Total Net Revenue by Category and Year (2022 vs 2023)

**`CASE WHEN` and `SUM`**

1. Pivot to get the total net revenue by category and compare 2023 with 2022.

    - Use `SUM` with `CASE WHEN` to calculate separate revenue totals for 2022 and 2023:
        - `CASE WHEN orderdate BETWEEN '2022-01-01' AND '2022-12-31'` for 2022 revenue.
        - `CASE WHEN orderdate BETWEEN '2023-01-01' AND '2023-12-31'` for 2023 revenue.
    - Join the `sales` to `product` table using `LEFT JOIN` to group by `categoryname`.
    - Group data by `categoryname` to provide a category-based comparison.
    - Sort results alphabetically by `categoryname`.

In [6]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE WHEN s.orderdate::date BETWEEN '2022-01-01' AND '2022-12-31' THEN (s.quantity * s.netprice * s.exchangerate) END) AS total_net_revenue_2022,
    SUM(CASE WHEN s.orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' THEN (s.quantity * s.netprice * s.exchangerate) END) AS total_net_revenue_2023
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    p.categoryname
ORDER BY
    p.categoryname;

Unnamed: 0,category,total_net_revenue_2022,total_net_revenue_2023
0,Audio,766938.21,688690.18
1,Cameras and camcorders,2382532.56,1983546.29
2,Cell phones,8119665.07,6002147.63
3,Computers,17862213.49,11650867.21
4,Games and Toys,316127.3,270374.96
5,Home Appliances,6612446.68,5919992.87
6,"Music, Movies and Audio Books",2989297.28,2180768.13
7,TV and Video,5815336.61,4412178.23


#### Categorize as Low and High for Total Net Revenue

**`SUM`**, **`CASE WHEN`**

1. Categorize the net revenue as low or high and find the total net revenue by category and low or high.
    - Use `SUM` with `CASE WHEN` to categorize net revenue:
        - Revenue less than 1,000 as "low" using `(quantity * netprice * exchangerate) < 1000`.
        - Revenue greater than or equal to 1,000 as "high" using `(quantity * netprice * exchangerate) >= 1000`.
    - Join the `sales` to `product` table using `LEFT JOIN` to group by `categoryname`.
    - Filter orders to include dates between 2022 and 2023 using `WHERE orderdate BETWEEN '2022-01-01' AND '2023-12-31'`.
    - Group data by `categoryname` to calculate total low and high revenues per category.
    - Sort results alphabetically by `categoryname`.

In [7]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE WHEN (s.quantity * s.netprice * exchangerate) < 1000 THEN (s.quantity * s.netprice * exchangerate) END) AS low_total_net_revenue,
    SUM(CASE WHEN (s.quantity * s.netprice * exchangerate) >= 1000 THEN (s.quantity * s.netprice * exchangerate) END) AS high_total_net_revenue
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2022-01-01' AND '2023-12-31' 
GROUP BY
    category
ORDER BY
    category;

Unnamed: 0,category,low_total_net_revenue,high_total_net_revenue
0,Audio,970542.98,485085.41
1,Cameras and camcorders,884178.45,3481900.4
2,Cell phones,5173880.4,8947932.31
3,Computers,4937765.59,24575315.1
4,Games and Toys,547757.88,38744.39
5,Home Appliances,1581307.97,10951131.58
6,"Music, Movies and Audio Books",2973461.1,2196604.3
7,TV and Video,1704582.92,8522931.91
