# Count Aggregation

## Overview

### 🥅 Analysis Goals

- What we’re going to use for this dataset to do X e.g. Use the following in order to explore a dataset on experience and salaries
    - Major topic 1
    - Major topic 2
    - Major topic 3
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### 📘 Concepts Covered

General concepts we’re going to cover

- Concept 1
- Concept 2
- Concept 3

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

---
## COUNT

### 📝 Notes

- Add in specific notes

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

**Basic Query**

Find the total sales for each entry by multiplying `quantity` (which is from the `sales` table) by the `price` in the `product` table and `exchangerate` (since not all sales are made in `USD`).

In [2]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    p.price,
    s.quantity * p.price * s.exchangerate AS total_sale_amount
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
ORDER BY
    s.orderkey

orderkey,orderdate,customerkey,storekey,productkey,quantity,price,total_sale_amount
1000,2015-01-01,947009,400,48,1,149.95,96.2004225
1000,2015-01-01,947009,400,460,1,299.9,192.400845
1001,2015-01-01,1772036,430,1730,2,77.68,155.36
1002,2015-01-01,1518349,660,955,4,196.9,787.6
1002,2015-01-01,1518349,660,62,7,181.0,1267.0
1002,2015-01-01,1518349,660,1050,3,312.0,936.0
1002,2015-01-01,1518349,660,1608,1,109.99,109.99
1003,2015-01-01,1317097,510,85,3,99.99,299.97
1004,2015-01-01,254117,80,128,2,143.4,332.203308
1004,2015-01-01,254117,80,2079,1,665.94,771.3649614000001


2. Join `customer` table to get customer info like continent and gender of the customer

In [None]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    c.continent, --Added 
    c.gender, -- Added
    s.productkey,
    s.quantity,
    p.price,
    s.quantity * p.price * s.exchangerate AS total_sale_amount
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
    LEFT JOIN customer c ON s.customerkey = c.customerkey
ORDER BY
    s.orderkey

orderkey,orderdate,customerkey,continent,gender,storekey,productkey,quantity,price,total_sale_amount
1000,2015-01-01,947009,Europe,male,400,48,1,149.95,96.2004225
1000,2015-01-01,947009,Europe,male,400,460,1,299.9,192.400845
1001,2015-01-01,1772036,North America,female,430,1730,2,77.68,155.36
1002,2015-01-01,1518349,North America,female,660,955,4,196.9,787.6
1002,2015-01-01,1518349,North America,female,660,62,7,181.0,1267.0
1002,2015-01-01,1518349,North America,female,660,1050,3,312.0,936.0
1002,2015-01-01,1518349,North America,female,660,1608,1,109.99,109.99
1003,2015-01-01,1317097,North America,male,510,85,3,99.99,299.97
1004,2015-01-01,254117,North America,male,80,128,2,143.4,332.203308
1004,2015-01-01,254117,North America,male,80,2079,1,665.94,771.3649614000001


3. Count by day how many distinct customers there were in 2023.

In [6]:
%%sql

SELECT
    s.orderdate,
    COUNT(DISTINCT s.customerkey) AS customer
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey
WHERE  
    s.orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
    s.orderdate
ORDER BY
    s.orderdate

orderdate,customer
2023-01-01,12
2023-01-02,49
2023-01-03,64
2023-01-04,78
2023-01-05,87
2023-01-06,57
2023-01-07,99
2023-01-08,10
2023-01-09,43
2023-01-10,49


---
## Pivot with COUNT

### 📝 Notes

- Add in specific notes

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

In [9]:
%%sql 

SELECT DISTINCT continent
FROM customer

continent
Europe
North America
Australia


In [8]:
%%sql

SELECT
    s.orderdate,
    COUNT(DISTINCT CASE WHEN c.continent = 'Europe' THEN s.customerkey END) AS eu_customer,
    COUNT(DISTINCT CASE WHEN c.continent = 'North America' THEN s.customerkey END) AS na_customer,
    COUNT(DISTINCT CASE WHEN c.continent = 'Australia' THEN s.customerkey END) AS au_customer
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey
WHERE  
    s.orderdate::date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
    s.orderdate
ORDER BY
    s.orderdate

orderdate,eu_customer,na_customer,au_customer
2023-01-01,6,5,1
2023-01-02,15,31,3
2023-01-03,17,44,3
2023-01-04,28,46,4
2023-01-05,22,57,8
2023-01-06,18,34,5
2023-01-07,26,66,7
2023-01-08,4,5,1
2023-01-09,10,30,3
2023-01-10,11,33,5
