# Conditional Aggregation

**Product focused**

## Overview

### 🥅 Analysis Goals

- Use the following to do an EDA of the products and their categories ordered from the `sales` table.
    - Compare total sales of products ordered in 2023 and 2022
    - Total sales in 2023 and 2022.
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### 📘 Concepts Covered

General concepts we’re going to cover

- Aggregation Review
- `SUM` with `CASE WHEN`
- Concept 3

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

---
## Major Topic  

### 📝 Notes

- Add in specific notes

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

**Basic Query**

Need to rewrite

In [2]:
%config SqlMagic.named_parameters = "disabled"

How would you get the total sales for each entry?

`quantity` * `price`

`quantity` is in the sales table while `price` is in the `product` table.

In [18]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    p.price,
    s.quantity * p.price AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
ORDER BY
    orderkey

orderkey,orderdate,customerkey,storekey,productkey,quantity,price,total_sale_amount
1000,2015-01-01,947009,400,48,1,149.95,149.95
1000,2015-01-01,947009,400,460,1,299.9,299.9
1001,2015-01-01,1772036,430,1730,2,77.68,155.36
1002,2015-01-01,1518349,660,955,4,196.9,787.6
1002,2015-01-01,1518349,660,62,7,181.0,1267.0
1002,2015-01-01,1518349,660,1050,3,312.0,936.0
1002,2015-01-01,1518349,660,1608,1,109.99,109.99
1003,2015-01-01,1317097,510,85,3,99.99,299.97
1004,2015-01-01,254117,80,128,2,143.4,286.8
1004,2015-01-01,254117,80,2079,1,665.94,665.94


Only for 2023 and if you notice we have different currencies. For now let's just get the ones in `USD`. We'll also add in the product category.

In [20]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    p.categoryname, -- Added
    s.quantity,
    p.price,
    s.quantity * p.price AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE -- Added
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
    AND s.currencycode = 'USD'
ORDER BY
    orderkey

orderkey,orderdate,customerkey,storekey,productkey,categoryname,quantity,price,total_sale_amount
2923003,2023-01-01,1889683,470,371,Computers,3,599.0,1797.0
2923003,2023-01-01,1889683,470,1605,"Music, Movies and Audio Books",6,289.99,1739.94
2923003,2023-01-01,1889683,470,1258,Cameras and camcorders,1,39.99,39.99
2923003,2023-01-01,1889683,470,1976,Home Appliances,3,899.0,2697.0
2923005,2023-01-01,1831111,650,151,TV and Video,1,1184.97,1184.97
2923005,2023-01-01,1831111,650,724,Computers,1,163.0,163.0
2923005,2023-01-01,1831111,650,502,Computers,2,90.0,180.0
2923005,2023-01-01,1831111,650,1397,Cell phones,1,26.99,26.99
2923005,2023-01-01,1831111,650,1123,Cameras and camcorders,2,328.0,656.0
2923007,2023-01-01,1272876,999999,2121,Home Appliances,6,129.9,779.4000000000001


Get the total sales by category. 
- Remove other columns except for category
- Aggregate by category

In [23]:
%%sql

SELECT
    p.categoryname AS category_name,
    SUM(s.quantity * p.price) AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
    AND s.currencycode = 'USD'
GROUP BY
    category_name
ORDER BY
    category_name

category_name,total_sale_amount
Audio,338083.15
Cameras and camcorders,943092.09
Cell phones,3006014.269999982
Computers,5818915.889999995
Games and Toys,133469.7799999999
Home Appliances,2851503.569999989
"Music, Movies and Audio Books",1109997.0199999928
TV and Video,2189181.6599999964


Create the same for sales 2022

In [24]:
%%sql

SELECT
    p.categoryname AS category_name,
    SUM(s.quantity * p.price) AS total_sale_amount
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2022-01-01' AND '2022-12-31'
    AND s.currencycode = 'USD'
GROUP BY
    category_name
ORDER BY
    category_name

category_name,total_sale_amount
Audio,454971.17000000016
Cameras and camcorders,1287755.719999999
Cell phones,3771883.869999973
Computers,8237459.589999989
Games and Toys,185274.31799999985
Home Appliances,3901537.33999998
"Music, Movies and Audio Books",1471575.4399999848
TV and Video,3412958.909999986


---
## Major Topic  

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

Create a pivot table using case when. Validate the data with the tables you got in the previous queries. Do they match up?

In [29]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE WHEN orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' THEN (s.quantity * p.price) END) AS y2023_total_sales,
    SUM(CASE WHEN orderdate::date BETWEEN '2022-01-01' AND '2022-12-31' THEN (s.quantity * p.price) END) AS y2022_total_sales
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    s.currencycode = 'USD'
GROUP BY
    category
ORDER BY
    category;

category,y2023_total_sales,y2022_total_sales
Audio,338083.15,454971.17000000016
Cameras and camcorders,943092.09,1287755.719999999
Cell phones,3006014.269999982,3771883.8699999736
Computers,5818915.889999995,8237459.589999989
Games and Toys,133469.7799999999,185274.3179999998
Home Appliances,2851503.569999989,3901537.3399999808
"Music, Movies and Audio Books",1109997.019999992,1471575.4399999846
TV and Video,2189181.659999996,3412958.909999987


#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

**Advanced Query**

That's simple but we can add in more conditions if necessary. Look at the total sales by category and currency type and depending on the type convert it to USD.

For 2023.

In [30]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE 
        WHEN s.currencycode = 'USD'
        THEN (s.quantity * p.price) 
    END) AS usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'CAD'
        THEN (s.quantity * p.price) 
    END) AS cad_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'EUR'
        THEN (s.quantity * p.price) 
    END) AS eur_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'GBP'
        THEN (s.quantity * p.price) 
    END) AS gbp_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'AUD'
        THEN (s.quantity * p.price) 
    END) AS aud_total_sales
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' 
GROUP BY
    category
ORDER BY
    category;

category,usd_total_sales,cad_total_sales,eur_total_sales,gbp_total_sales,aud_total_sales
Audio,338083.15,93966.71000000002,164987.84000000003,55820.98,45196.900000000016
Cameras and camcorders,943092.09,195588.60000000003,550869.4400000001,169967.75,168523.72999999998
Cell phones,3006014.269999981,738963.8799999995,1470688.119999995,519571.91,399104.81000000006
Computers,5818915.889999997,1384265.6399999997,3018397.869999995,957755.79,742292.76
Games and Toys,133469.77999999994,30083.94999999999,72040.04199999999,24516.709999999992,17130.02799999999
Home Appliances,2851503.569999989,623705.2399999999,1701579.369999997,501812.73,429066.43
"Music, Movies and Audio Books",1109997.0199999926,260598.83000000013,526772.3900000013,180061.85000000012,150640.9500000001
TV and Video,2189181.659999996,594055.7699999998,1135506.5899999996,330341.0,258503.6900000001


Convert all currencies to USD.

In [31]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE 
        WHEN s.currencycode = 'USD'
        THEN (s.quantity * p.price) 
    END) AS usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'CAD'
        THEN (s.quantity * p.price) * exchangerate 
    END) AS cad_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'EUR'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS eur_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'GBP'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS gbp_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'AUD'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS aud_to_usd_total_sales
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' 
GROUP BY
    category
ORDER BY
    category;

category,usd_total_sales,cad_to_usd_total_sales,eur_to_usd_total_sales,gbp_to_usd_total_sales,aud_to_usd_total_sales
Audio,338083.15,126589.85376459996,152721.57465409994,45038.65697449999,68214.6370891
Cameras and camcorders,943092.09,264861.0850357,510115.0287325,136672.16738660002,253225.2615664001
Cell phones,3006014.269999981,997174.0833737002,1362555.2720488005,418694.89928350016,598659.2379617998
Computers,5818915.889999995,1869281.0500120996,2797231.8803697983,771951.5454527003,1116387.3692956998
Games and Toys,133469.77999999985,40711.20255670001,66823.59260026002,19725.4025941,25751.71763642
Home Appliances,2851503.569999989,842336.2823175001,1578051.4285214995,403679.1346686,642268.7681927
"Music, Movies and Audio Books",1109997.0199999923,351906.75199170003,488220.6107101,145099.3401948,226443.51659940003
TV and Video,2189181.6599999955,803487.3112743,1052077.489857901,268095.68998250004,386292.6455603


Compare both 2022 and 2023 using `UNION ALL` 

In [39]:
%%sql 

SELECT
    p.categoryname AS category,
    SUM(CASE 
        WHEN s.currencycode = 'USD'
        THEN (s.quantity * p.price) 
    END) AS usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'CAD'
        THEN (s.quantity * p.price) * exchangerate 
    END) AS cad_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'EUR'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS eur_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'GBP'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS gbp_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'AUD'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS aud_to_usd_total_sales,
    '2023' AS year_string
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' 
GROUP BY
    category
UNION ALL    
SELECT
    p.categoryname AS category,
    SUM(CASE 
        WHEN s.currencycode = 'USD'
        THEN (s.quantity * p.price) 
    END) AS usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'CAD'
        THEN (s.quantity * p.price) * exchangerate 
    END) AS cad_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'EUR'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS eur_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'GBP'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS gbp_to_usd_total_sales,
    SUM(CASE 
        WHEN s.currencycode = 'AUD'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS aud_to_usd_total_sales,
    '2022' AS year_string
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2022-01-01' AND '2022-12-31' 
GROUP BY
    category
ORDER BY
    category, year_string
;

category,usd_total_sales,cad_to_usd_total_sales,eur_to_usd_total_sales,gbp_to_usd_total_sales,aud_to_usd_total_sales,year_string
Audio,454971.1699999996,111821.40935489998,157106.08403899998,53728.33089809997,76500.33795210002,2022
Audio,338083.15,126589.8537646,152721.57465409997,45038.6569745,68214.63708910001,2023
Cameras and camcorders,1287755.7199999983,285510.5870957001,480298.1469745998,144259.24284670007,231378.04302080005,2022
Cameras and camcorders,943092.0899999992,264861.08503569994,510115.0287325,136672.16738660002,253225.26156639992,2023
Cell phones,3771883.8700000537,989939.7606521996,1386397.2765206962,500676.9019484004,693965.6630237,2022
Cell phones,3006014.2700000205,997174.0833736996,1362555.272048801,418694.8992834996,598659.2379618001,2023
Computers,8237459.590000069,1896283.5594323012,2952361.5061829058,1135455.0232646,1326502.4510902008,2022
Computers,5818915.890000027,1869281.050012102,2797231.8803698,771951.5454527009,1116387.3692957005,2023
Games and Toys,185274.31800000093,43059.80099560002,66485.02820381998,24120.27572648001,32525.207539899988,2022
Games and Toys,133469.78000000006,40711.20255670002,66823.59260026,19725.4025941,25751.717636419995,2023


📝 `ROUND()` function and `CAST()` function to make it more readable.

In [34]:
%%sql 

SELECT
    p.categoryname AS category,
    ROUND(CAST(SUM(CASE 
        WHEN s.currencycode = 'USD'
        THEN (s.quantity * p.price) 
    END) AS NUMERIC), 2) AS usd_total_sales,
    ROUND(CAST(SUM(CASE 
        WHEN s.currencycode = 'CAD'
        THEN (s.quantity * p.price) * exchangerate 
    END) AS NUMERIC), 2) AS cad_to_usd_total_sales,
    ROUND(CAST(SUM(CASE 
        WHEN s.currencycode = 'EUR'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS NUMERIC), 2) AS eur_to_usd_total_sales,
    ROUND(CAST(SUM(CASE 
        WHEN s.currencycode = 'GBP'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS NUMERIC), 2) AS gbp_to_usd_total_sales,
    ROUND(CAST(SUM(CASE 
        WHEN s.currencycode = 'AUD'
        THEN (s.quantity * p.price) * exchangerate  
    END) AS NUMERIC), 2) AS aud_to_usd_total_sales
FROM
    sales s
JOIN
    product p ON s.productkey = p.productkey
WHERE
    orderdate::date BETWEEN '2023-01-01' AND '2023-12-31' 
GROUP BY
    category
ORDER BY
    category;

category,usd_total_sales,cad_to_usd_total_sales,eur_to_usd_total_sales,gbp_to_usd_total_sales,aud_to_usd_total_sales
Audio,338083.15,126589.85,152721.57,45038.66,68214.64
Cameras and camcorders,943092.09,264861.09,510115.03,136672.17,253225.26
Cell phones,3006014.27,997174.08,1362555.27,418694.9,598659.24
Computers,5818915.89,1869281.05,2797231.88,771951.55,1116387.37
Games and Toys,133469.78,40711.2,66823.59,19725.4,25751.72
Home Appliances,2851503.57,842336.28,1578051.43,403679.13,642268.77
"Music, Movies and Audio Books",1109997.02,351906.75,488220.61,145099.34,226443.52
TV and Video,2189181.66,803487.31,1052077.49,268095.69,386292.65
