# Conditional Aggregation

**Product focused**

## Overview

### 🥅 Analysis Goals

- Use the following to do an EDA of the products and their categories ordered from the `sales` table.
    - Compare total sales of products ordered in 2023 and 2022
    - Total sales in 2023 and 2022.
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### 📘 Concepts Covered

General concepts we’re going to cover

- Aggregation Review
- `SUM` with `CASE WHEN`
- Concept 3

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

In [None]:
%config SqlMagic.named_parameters = "disabled"

---
## Major Topic  

### 📝 Notes

- Add in specific notes

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Find the total sales for each entry by multiplying `quantity` (which is from the `sales` table) by the `price` in the `product` table and `exchangerate` (since not all sales are made in `USD`).

In [None]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    p.price,
    s.quantity * p.price * s.exchangerate AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
ORDER BY
    orderkey

orderkey,orderdate,customerkey,storekey,productkey,quantity,price,total_sale_amount
1000,2015-01-01,947009,400,48,1,149.95,149.95
1000,2015-01-01,947009,400,460,1,299.9,299.9
1001,2015-01-01,1772036,430,1730,2,77.68,155.36
1002,2015-01-01,1518349,660,955,4,196.9,787.6
1002,2015-01-01,1518349,660,62,7,181.0,1267.0
1002,2015-01-01,1518349,660,1050,3,312.0,936.0
1002,2015-01-01,1518349,660,1608,1,109.99,109.99
1003,2015-01-01,1317097,510,85,3,99.99,299.97
1004,2015-01-01,254117,80,128,2,143.4,286.8
1004,2015-01-01,254117,80,2079,1,665.94,665.94


2. Filter the data to only return data from 2023 and return the `categoryname`.

In [None]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    p.categoryname, -- Added
    s.quantity,
    p.price,
    s.quantity * p.price * s.exchangerate AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE -- Added
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
ORDER BY
    orderkey

orderkey,orderdate,customerkey,storekey,productkey,categoryname,quantity,price,total_sale_amount
2923003,2023-01-01,1889683,470,371,Computers,3,599.0,1797.0
2923003,2023-01-01,1889683,470,1605,"Music, Movies and Audio Books",6,289.99,1739.94
2923003,2023-01-01,1889683,470,1258,Cameras and camcorders,1,39.99,39.99
2923003,2023-01-01,1889683,470,1976,Home Appliances,3,899.0,2697.0
2923005,2023-01-01,1831111,650,151,TV and Video,1,1184.97,1184.97
2923005,2023-01-01,1831111,650,724,Computers,1,163.0,163.0
2923005,2023-01-01,1831111,650,502,Computers,2,90.0,180.0
2923005,2023-01-01,1831111,650,1397,Cell phones,1,26.99,26.99
2923005,2023-01-01,1831111,650,1123,Cameras and camcorders,2,328.0,656.0
2923007,2023-01-01,1272876,999999,2121,Home Appliances,6,129.9,779.4000000000001


3. Aggregegate the data to get the total sales by category. 
    - Remove other columns except for category
    - Aggregate by category

In [None]:
%%sql

SELECT
    p.categoryname AS category_name,
    SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
    category_name
ORDER BY
    category_name

category_name,total_sale_amount
Audio,338083.15
Cameras and camcorders,943092.09
Cell phones,3006014.269999982
Computers,5818915.889999995
Games and Toys,133469.7799999999
Home Appliances,2851503.569999989
"Music, Movies and Audio Books",1109997.0199999928
TV and Video,2189181.6599999964


4. For 2022 we could do the same thing but just edit the date to be 2022.

In [None]:
%%sql

SELECT
    p.categoryname AS category_name,
    SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2022-01-01' AND '2022-12-31'
GROUP BY
    category_name
ORDER BY
    category_name

category_name,total_sale_amount
Audio,454971.17000000016
Cameras and camcorders,1287755.719999999
Cell phones,3771883.869999973
Computers,8237459.589999989
Games and Toys,185274.31799999985
Home Appliances,3901537.33999998
"Music, Movies and Audio Books",1471575.4399999848
TV and Video,3412958.909999986


---
## Major Topic  

#### Total Sales by Category and Year

**`CASE WHEN` and `SUM`**

1. Create a pivot table using `CASE WHEN` and `SUM` to total up the sales by category *and* for each year.

In [None]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE WHEN orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' THEN (s.quantity * p.price * s.exchangerate) END) AS y2023_total_sales,
    SUM(CASE WHEN orderdate::date BETWEEN '2022-01-01' AND '2022-12-31' THEN (s.quantity * p.price * s.exchangerate) END) AS y2022_total_sales
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
GROUP BY
    category
ORDER BY
    category;

category,y2023_total_sales,y2022_total_sales
Audio,338083.15,454971.17000000016
Cameras and camcorders,943092.09,1287755.719999999
Cell phones,3006014.269999982,3771883.8699999736
Computers,5818915.889999995,8237459.589999989
Games and Toys,133469.7799999999,185274.3179999998
Home Appliances,2851503.569999989,3901537.3399999808
"Music, Movies and Audio Books",1109997.019999992,1471575.4399999846
TV and Video,2189181.659999996,3412958.909999987


#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

**Advanced Query**