<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/0_Intro/3_Table_Overview.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Table Overview

In [19]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Dataset Overview

- Familiar with dataset
- Review all of the tables
- Investigate the sales table further

### Note

Quick note: We aren't using `SELECT *` because...
- It costs **money** if  you're running this on the Cloud (e.g. BigQuery).
- It costs **time** by taking longer to load the results.

If you want to use `SELECT *` like when you're first looking at a table use `LIMIT` to limit the number of rows it returns. For most of our queries we're going to be selecting specific columns.

### EDR Diagram

Below is the Entity Relationship Diagram (EDR) showing how the tables relate to each other. 

**Note**: This is a simplified EDR diagram and it doesn't show every column.

**📊Insert EDR Diagram📊**

## Tables

There are 6 tables in total:

1. `sales` - Records each sale (one sale can be part of the same order)
2. `customer` - Information on the customer
3. `product` - Information on the product
4. `currencyexchange` - The currency exchange rates for various types of currency with the date of the exchange.
5. `date` - A date dimension table (a table containing dates and related descriptive attributes) which we won't be using that much (since we'll mostly be using date functions). 
6. `store` - Information on the store the items were sold from.

We will mainly be working with the tables:
- `sale`
- `customer`
- `product`

### Currency Exchange

In [20]:
%%sql

SELECT *
FROM currencyexchange
LIMIT 5

Unnamed: 0,date,fromcurrency,tocurrency,exchange
0,2015-01-01,AUD,AUD,1.0
1,2015-01-01,AUD,CAD,0.94834
2,2015-01-01,AUD,EUR,0.67435
3,2015-01-01,AUD,GBP,0.52525
4,2015-01-01,AUD,USD,0.81873


### Date

In [21]:
%%sql

SELECT *
FROM date
LIMIT 5

Unnamed: 0,date,datekey,year,yearquarter,yearquarternumber,quarter,yearmonth,yearmonthshort,yearmonthnumber,month,monthshort,monthnumber,dayofweek,dayofweekshort,dayofweeknumber,workingday,workingdaynumber
0,2015-01-01,20150101,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Thursday,Thu,5,0,0
1,2015-01-02,20150102,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Friday,Fri,6,1,1
2,2015-01-03,20150103,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Saturday,Sat,7,0,1
3,2015-01-04,20150104,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Sunday,Sun,1,0,1
4,2015-01-05,20150105,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Monday,Mon,2,1,2


### Store

In [22]:
%%sql

SELECT *
FROM store
LIMIT 5

Unnamed: 0,storekey,storecode,geoareakey,countrycode,countryname,state,opendate,closedate,description,squaremeters,status
0,10,1,1,AU,Australia,Australian Capital Territory,2008-01-01,,Contoso Store Australian Capital Territory,595.0,
1,20,2,3,AU,Australia,Northern Territory,2008-01-12,2016-07-07,Contoso Store Northern Territory,665.0,Closed
2,30,3,5,AU,Australia,South Australia,2012-01-07,2015-08-08,Contoso Store South Australia,2000.0,Restructured
3,35,3,5,AU,Australia,South Australia,2015-12-08,,Contoso Store South Australia,3000.0,
4,40,4,6,AU,Australia,Tasmania,2010-01-01,,Contoso Store Tasmania,2000.0,


### Customers

In [23]:
%%sql

SELECT
    customerkey,
    continent,
    gender,
    givenname,
    surname,
    countryfull,
    birthday,
    company
FROM
    customer

Unnamed: 0,customerkey,continent,gender,givenname,surname,countryfull,birthday,company
0,15,Australia,male,Julian,McGuigan,Australia,1965-03-24,Cut Rite Lawn Care
1,23,Australia,female,Rose,Dash,Australia,1990-05-10,Rack N Sack
2,36,Australia,female,Annabelle,Townsend,Australia,1964-07-16,id Boutiques
3,120,Australia,male,Jamie,Hetherington,Australia,1946-12-11,Showbiz Pizza Place
4,180,Australia,male,Gabriel,Bosanquet,Australia,1955-04-24,Dubrow's Cafeteria
...,...,...,...,...,...,...,...,...
104985,2099639,North America,male,Miroslav,Slach,United States,1945-04-30,Strength Gurus
104986,2099656,North America,male,Wilfredo,Lozada,United States,1945-08-24,Williams Bros.
104987,2099697,North America,male,Phillipp,Maier,United States,1966-12-08,Excella
104988,2099711,North America,female,Katerina,Pavlícková,United States,1941-01-01,Lawnscape Garden Maintenance


### Product

Overview of the product table which contains information on the different products Contoso sells. There's more columns but we'll only be looking at a few.

In [24]:
%%sql

SELECT
    productkey,
    productcode,
    productname,
    cost,
    price,
    categoryname,
    subcategoryname
FROM
    product
ORDER BY
    productkey

Unnamed: 0,productkey,productcode,productname,cost,price,categoryname,subcategoryname
0,1,101001,Contoso 512MB MP3 Player E51 Silver,6.62,12.99,Audio,MP4&MP3
1,2,101002,Contoso 512MB MP3 Player E51 Blue,6.62,12.99,Audio,MP4&MP3
2,3,101003,Contoso 1G MP3 Player E100 White,7.40,14.52,Audio,MP4&MP3
3,4,101004,Contoso 2G MP3 Player E200 Silver,11.00,21.57,Audio,MP4&MP3
4,5,101005,Contoso 2G MP3 Player E200 Red,11.00,21.57,Audio,MP4&MP3
...,...,...,...,...,...,...,...
2512,2513,505026,Contoso Bluetooth Active Headphones L15 Red,43.07,129.99,Cell phones,Cell phones Accessories
2513,2514,505027,Contoso Bluetooth Active Headphones L15 White,43.07,129.99,Cell phones,Cell phones Accessories
2514,2515,505028,Contoso In-Line Coupler E180 White,1.71,3.35,Cell phones,Cell phones Accessories
2515,2516,505029,Contoso In-Line Coupler E180 Black,1.71,3.35,Cell phones,Cell phones Accessories


### Sales

Overview of the sales table and the columns we'll use the most.

In [25]:
%%sql

SELECT
    orderkey,
    orderdate,
    customerkey,
    storekey,
    productkey,
    quantity,
    unitprice,
    currencycode,
    exchangerate
FROM
    sales

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate
0,1000,2015-01-01,947009,400,48,1,112.4625,GBP,0.64155
1,1000,2015-01-01,947009,400,460,1,749.7500,GBP,0.64155
2,1001,2015-01-01,1772036,430,1730,2,54.3760,USD,1.00000
3,1002,2015-01-01,1518349,660,955,4,315.0400,USD,1.00000
4,1002,2015-01-01,1518349,660,62,7,135.7500,USD,1.00000
...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,159.9900,EUR,0.93870
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,EUR,0.93870
199870,3398035,2024-04-20,267690,999999,1575,2,60.9900,CAD,1.37670
199871,3398035,2024-04-20,267690,999999,415,5,326.0000,CAD,1.37670


### Further Investigation

We'll be using the sales table the most so let's explore it a bit more. 

Add the calculation for the `net_revenue`: 

> `quantity * netprice * exchangerate`

Why?

- Get **Net Revenue**
    - Definition: The total revenue after accounting for discounts, promotions, and adjustments. It's the actual price paid by customers. 
    - Formula: `netprice` * `quantity`
- `exchangerate` must be multiplied because not every sale is in USD currency, you can see which currency it is in the `currencycode` column.

In [26]:
%%sql

SELECT
    orderkey,
    orderdate,
    customerkey,
    storekey,
    productkey,
    quantity,
    unitprice,
    currencycode,
    exchangerate,
    quantity * netprice * exchangerate AS net_revenue --Added
FROM
    sales
ORDER BY -- Added
    orderkey 

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1000,2015-01-01,947009,400,48,1,112.4625,GBP,0.64155,63.492279
1,1000,2015-01-01,947009,400,460,1,749.7500,GBP,0.64155,423.281859
2,1001,2015-01-01,1772036,430,1730,2,54.3760,USD,1.00000,108.752000
3,1002,2015-01-01,1518349,660,955,4,315.0400,USD,1.00000,1146.745600
4,1002,2015-01-01,1518349,660,62,7,135.7500,USD,1.00000,950.250000
...,...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,159.9900,EUR,0.93870,914.612113
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,EUR,0.93870,150.182613
199870,3398035,2024-04-20,267690,999999,1575,2,60.9900,CAD,1.37670,147.778282
199871,3398035,2024-04-20,267690,999999,415,5,326.0000,CAD,1.37670,2019.618900


Look at anything after 2020 by using the `WHERE` clause to only return sales that are on or after the date 2020-01-01 (YYYY-MM-DD format).

In [27]:
%%sql

SELECT
    orderkey,
    orderdate,
    customerkey,
    storekey,
    productkey,
    quantity,
    unitprice,
    currencycode,
    exchangerate,
    quantity * netprice * exchangerate AS net_revenue
FROM
    sales
WHERE --Added
    orderdate::date >= '2020-01-01'
ORDER BY
    orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1827000,2020-01-01,593517,999999,1810,4,28.800,EUR,0.89015,99.468922
1,1827000,2020-01-01,593517,999999,1809,6,28.800,EUR,0.89015,139.974307
2,1827000,2020-01-01,593517,999999,698,2,376.000,EUR,0.89015,669.392800
3,1827000,2020-01-01,593517,999999,364,6,765.900,EUR,0.89015,4090.595310
4,1827001,2020-01-01,307502,80,1288,2,101.387,CAD,1.29945,237.145207
...,...,...,...,...,...,...,...,...,...,...
124446,3398034,2024-04-20,664396,999999,1651,7,159.990,EUR,0.93870,914.612113
124447,3398034,2024-04-20,664396,999999,1646,1,159.990,EUR,0.93870,150.182613
124448,3398035,2024-04-20,267690,999999,1575,2,60.990,CAD,1.37670,147.778282
124449,3398035,2024-04-20,267690,999999,415,5,326.000,CAD,1.37670,2019.618900


Also get the customer information from the sale. Use `LEFT JOIN` to join the `sales` table with the `customers` table to get the customer information. Assign the alias `s` to the `sales` table and `c` to `customer` table.

`LEFT JOIN` returns all records from the left table (Table A), and the matching rows from the right table (Table B). 

![sql_left_join.png](/Resources/images/sql_left_join.png)

In [28]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    c.givenname,
    c.surname, -- Added
    c.continent, -- Added
    c.countryfull, -- Added
    s.storekey,
    s.productkey,
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue 
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey -- Added
WHERE
    s.orderdate::date > '2020-01-01'
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,givenname,surname,continent,countryfull,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1828000,2020-01-02,1419367,Chuck,Cecil,North America,United States,650,2494,2,4.116,USD,1.00000,8.232000
1,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,2505,2,13.986,EUR,0.89342,23.741207
2,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,1683,6,4.491,EUR,0.89342,24.074095
3,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,413,4,898.500,EUR,0.89342,2986.184876
4,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,1008,1,202.950,EUR,0.89342,181.319589
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124333,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,999999,1651,7,159.990,EUR,0.93870,914.612113
124334,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,999999,1646,1,159.990,EUR,0.93870,150.182613
124335,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,999999,1575,2,60.990,CAD,1.37670,147.778282
124336,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,999999,415,5,326.000,CAD,1.37670,2019.618900


Add in product information like the product and category name (of the product). Use a `LEFT JOIN` to join the `product` table to the `sales` table and assign the alias `p` to the product table.

In [29]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    c.givenname,
    c.surname,
    c.continent,
    c.countryfull,
    s.storekey,
    s.productkey,
    p.productname,-- Added
    p.categoryname, -- Added
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue 
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey 
    LEFT JOIN product p ON s.productkey = p.productkey -- Added
WHERE
    s.orderdate::date > '2020-01-01'
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,givenname,surname,continent,countryfull,storekey,productkey,productname,categoryname,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1828000,2020-01-02,1419367,Chuck,Cecil,North America,United States,650,2494,Reusable Phone Screen Protector E120,Cell phones,2,4.116,USD,1.00000,8.232000
1,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,2505,Contoso Touch Stylus Pen E150 Red,Cell phones,2,13.986,EUR,0.89342,23.741207
2,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,1683,MGS Hand Games for 12-16 boys E600 Silver,Games and Toys,6,4.491,EUR,0.89342,24.074095
3,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,413,Proseware Laptop16 M610 White,Computers,4,898.500,EUR,0.89342,2986.184876
4,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,1008,A. Datum Consumer Digital Camera M300 Orange,Cameras and camcorders,1,202.950,EUR,0.89342,181.319589
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124333,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,999999,1651,Contoso DVD 9-Inch Player Portable M300 Silver,"Music, Movies and Audio Books",7,159.990,EUR,0.93870,914.612113
124334,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,999999,1646,Contoso DVD 9-Inch Player Portable M300 Black,"Music, Movies and Audio Books",1,159.990,EUR,0.93870,150.182613
124335,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,999999,1575,SV DVD Player M140 Gold,"Music, Movies and Audio Books",2,60.990,CAD,1.37670,147.778282
124336,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,999999,415,Proseware Laptop8.9 E089 White,Computers,5,326.000,CAD,1.37670,2019.618900


Add in a condition to flag whether the quantity ordered was above 5, if it is then assign "High" to it, if not then assign "Low". This can help quickly us quickly filter orders that have a higher quantity.

In [30]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    c.givenname,
    c.surname,
    c.continent,
    c.countryfull,
    s.storekey,
    s.productkey,
    p.productname,
    p.categoryname,
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue,
    CASE WHEN s.quantity > 5 THEN 'High' ELSE 'Low' END AS quantity_level -- Added
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey 
    LEFT JOIN product p ON s.productkey = p.productkey
WHERE
    s.orderdate::date > '2020-01-01'
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,givenname,surname,continent,countryfull,storekey,productkey,productname,categoryname,quantity,unitprice,currencycode,exchangerate,net_revenue,quantity_level
0,1828000,2020-01-02,1419367,Chuck,Cecil,North America,United States,650,2494,Reusable Phone Screen Protector E120,Cell phones,2,4.116,USD,1.00000,8.232000,Low
1,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,2505,Contoso Touch Stylus Pen E150 Red,Cell phones,2,13.986,EUR,0.89342,23.741207,Low
2,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,1683,MGS Hand Games for 12-16 boys E600 Silver,Games and Toys,6,4.491,EUR,0.89342,24.074095,High
3,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,413,Proseware Laptop16 M610 White,Computers,4,898.500,EUR,0.89342,2986.184876,Low
4,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,190,1008,A. Datum Consumer Digital Camera M300 Orange,Cameras and camcorders,1,202.950,EUR,0.89342,181.319589,Low
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124333,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,999999,1651,Contoso DVD 9-Inch Player Portable M300 Silver,"Music, Movies and Audio Books",7,159.990,EUR,0.93870,914.612113,High
124334,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,999999,1646,Contoso DVD 9-Inch Player Portable M300 Black,"Music, Movies and Audio Books",1,159.990,EUR,0.93870,150.182613,Low
124335,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,999999,1575,SV DVD Player M140 Gold,"Music, Movies and Audio Books",2,60.990,CAD,1.37670,147.778282,Low
124336,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,999999,415,Proseware Laptop8.9 E089 White,Computers,5,326.000,CAD,1.37670,2019.618900,Low
