# Table Overview

In [18]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Dataset Overview

Goal:
- Familiar with dataset
- Looking at the main tables
- Explain why we use exchangerate

## Tables

There are X tables in total but we'll be mainly working with the following (and going into more detail on each later):

1. sale
2. customer
3. product

There are an additional X tables: 
1. currencyexchange - The currency exchange rates for various types of currency with the date of the exchange.
2. date - A date dimension table (a table containing dates and related descriptive attributes) which we won't be using that much (since we'll mostly be using date functions). 
3. store - Information on the store the items were sold from.

Below is the EDR Diagram

**Insert EDR Diagram 📊**

### Customers

In [19]:
%%sql

SELECT
    c.customerkey,
    c.continent,
    c.gender,
    c.givenname,
    c.surname,
    c.countryfull,
    c.birthday,
    c.company
FROM
    customer c

Unnamed: 0,customerkey,continent,gender,givenname,surname,countryfull,birthday,company
0,15,Australia,male,Julian,McGuigan,Australia,1965-03-24,Cut Rite Lawn Care
1,23,Australia,female,Rose,Dash,Australia,1990-05-10,Rack N Sack
2,36,Australia,female,Annabelle,Townsend,Australia,1964-07-16,id Boutiques
3,120,Australia,male,Jamie,Hetherington,Australia,1946-12-11,Showbiz Pizza Place
4,180,Australia,male,Gabriel,Bosanquet,Australia,1955-04-24,Dubrow's Cafeteria
...,...,...,...,...,...,...,...,...
104985,2099639,North America,male,Miroslav,Slach,United States,1945-04-30,Strength Gurus
104986,2099656,North America,male,Wilfredo,Lozada,United States,1945-08-24,Williams Bros.
104987,2099697,North America,male,Phillipp,Maier,United States,1966-12-08,Excella
104988,2099711,North America,female,Katerina,Pavlícková,United States,1941-01-01,Lawnscape Garden Maintenance


### Product

Overview of the product table which contains information on the different products Contoso sells. There's more columns but we'll only be looking at a few.

In [20]:
%%sql

SELECT
    p.productkey,
    p.productcode,
    p.productname,
    p.cost,
    p.price,
    p.categoryname,
    p.subcategoryname
FROM
    product p
ORDER BY
    p.productkey

Unnamed: 0,productkey,productcode,productname,cost,price,categoryname,subcategoryname
0,1,101001,Contoso 512MB MP3 Player E51 Silver,6.62,12.99,Audio,MP4&MP3
1,2,101002,Contoso 512MB MP3 Player E51 Blue,6.62,12.99,Audio,MP4&MP3
2,3,101003,Contoso 1G MP3 Player E100 White,7.40,14.52,Audio,MP4&MP3
3,4,101004,Contoso 2G MP3 Player E200 Silver,11.00,21.57,Audio,MP4&MP3
4,5,101005,Contoso 2G MP3 Player E200 Red,11.00,21.57,Audio,MP4&MP3
...,...,...,...,...,...,...,...
2512,2513,505026,Contoso Bluetooth Active Headphones L15 Red,43.07,129.99,Cell phones,Cell phones Accessories
2513,2514,505027,Contoso Bluetooth Active Headphones L15 White,43.07,129.99,Cell phones,Cell phones Accessories
2514,2515,505028,Contoso In-Line Coupler E180 White,1.71,3.35,Cell phones,Cell phones Accessories
2515,2516,505029,Contoso In-Line Coupler E180 Black,1.71,3.35,Cell phones,Cell phones Accessories


### Sales

Overview of the sales table and the columns we'll use the most.

In [21]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    s.netprice,
    s.currencycode,
    s.exchangerate
FROM
    sales s
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,netprice,currencycode,exchangerate
0,1000,2015-01-01,947009,400,48,1,98.9670,GBP,0.64155
1,1000,2015-01-01,947009,400,460,1,659.7800,GBP,0.64155
2,1001,2015-01-01,1772036,430,1730,2,54.3760,USD,1.00000
3,1002,2015-01-01,1518349,660,955,4,286.6864,USD,1.00000
4,1002,2015-01-01,1518349,660,62,7,135.7500,USD,1.00000
...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,139.1913,EUR,0.93870
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,EUR,0.93870
199870,3398035,2024-04-20,267690,999999,1575,2,53.6712,CAD,1.37670
199871,3398035,2024-04-20,267690,999999,415,5,293.4000,CAD,1.37670


Add the calculation for the `net_revenue`.

We're multiplying `quantity * netprice * exchangerate`.

Why?

- Get **Net Revenue**
    - Definition: The total revenue after accounting for discounts, promotions, and adjustments. It's the actual price paid by customers. 
    - Formula: `netprice` * `quantity`
- `exchangerate` must be multiplied because not every sale is in USD currency, you can see which currency it is in the `currencycode` column.

In [22]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue -- Added
FROM
    sales s
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1000,2015-01-01,947009,400,48,1,112.4625,GBP,0.64155,63.492279
1,1000,2015-01-01,947009,400,460,1,749.7500,GBP,0.64155,423.281859
2,1001,2015-01-01,1772036,430,1730,2,54.3760,USD,1.00000,108.752000
3,1002,2015-01-01,1518349,660,955,4,315.0400,USD,1.00000,1146.745600
4,1002,2015-01-01,1518349,660,62,7,135.7500,USD,1.00000,950.250000
...,...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,159.9900,EUR,0.93870,914.612113
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,EUR,0.93870,150.182613
199870,3398035,2024-04-20,267690,999999,1575,2,60.9900,CAD,1.37670,147.778282
199871,3398035,2024-04-20,267690,999999,415,5,326.0000,CAD,1.37670,2019.618900


Look at anything after 2020.

In [23]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue
FROM
    sales s
WHERE --Added
    s.orderdate::date > '2020-01-01'
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1828000,2020-01-02,1419367,650,2494,2,4.116,USD,1.00000,8.232000
1,1828001,2020-01-02,451241,190,2505,2,13.986,EUR,0.89342,23.741207
2,1828001,2020-01-02,451241,190,1683,6,4.491,EUR,0.89342,24.074095
3,1828001,2020-01-02,451241,190,413,4,898.500,EUR,0.89342,2986.184876
4,1828001,2020-01-02,451241,190,1008,1,202.950,EUR,0.89342,181.319589
...,...,...,...,...,...,...,...,...,...,...
124333,3398034,2024-04-20,664396,999999,1651,7,159.990,EUR,0.93870,914.612113
124334,3398034,2024-04-20,664396,999999,1646,1,159.990,EUR,0.93870,150.182613
124335,3398035,2024-04-20,267690,999999,1575,2,60.990,CAD,1.37670,147.778282
124336,3398035,2024-04-20,267690,999999,415,5,326.000,CAD,1.37670,2019.618900


Join with the customers table to get information on the customer who ordered.

In [26]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    c.givenname,
    c.surname, -- Added
    c.continent, -- Added
    c.countryfull, -- Added
    c.startdt, -- Added
    s.storekey,
    s.productkey,
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue 
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey -- Added
WHERE
    s.orderdate::date > '2020-01-01'
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,givenname,surname,continent,countryfull,startdt,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1828000,2020-01-02,1419367,Chuck,Cecil,North America,United States,1992-12-11,650,2494,2,4.116,USD,1.00000,8.232000
1,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1985-12-04,190,2505,2,13.986,EUR,0.89342,23.741207
2,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1985-12-04,190,1683,6,4.491,EUR,0.89342,24.074095
3,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1985-12-04,190,413,4,898.500,EUR,0.89342,2986.184876
4,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1985-12-04,190,1008,1,202.950,EUR,0.89342,181.319589
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124333,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,1984-03-24,999999,1651,7,159.990,EUR,0.93870,914.612113
124334,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,1984-03-24,999999,1646,1,159.990,EUR,0.93870,150.182613
124335,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,1982-01-08,999999,1575,2,60.990,CAD,1.37670,147.778282
124336,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,1982-01-08,999999,415,5,326.000,CAD,1.37670,2019.618900


Also join with the product table to get the product category.

In [27]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    c.givenname,
    c.surname,
    c.continent,
    c.countryfull,
    c.birthday,
    s.storekey,
    s.productkey,
    p.productname,-- Added
    p.categoryname, -- Added
    p.subcategoryname, -- Added
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue 
FROM
    sales s
    LEFT JOIN customer c ON s.customerkey = c.customerkey 
    LEFT JOIN product p ON s.productkey = p.productkey -- Added
WHERE
    s.orderdate::date > '2020-01-01'
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,givenname,surname,continent,countryfull,birthday,storekey,productkey,productname,categoryname,subcategoryname,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1828000,2020-01-02,1419367,Chuck,Cecil,North America,United States,1980-04-22,650,2494,Reusable Phone Screen Protector E120,Cell phones,Cell phones Accessories,2,4.116,USD,1.00000,8.232000
1,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1966-01-12,190,2505,Contoso Touch Stylus Pen E150 Red,Cell phones,Cell phones Accessories,2,13.986,EUR,0.89342,23.741207
2,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1966-01-12,190,1683,MGS Hand Games for 12-16 boys E600 Silver,Games and Toys,Boxed Games,6,4.491,EUR,0.89342,24.074095
3,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1966-01-12,190,413,Proseware Laptop16 M610 White,Computers,Laptops,4,898.500,EUR,0.89342,2986.184876
4,1828001,2020-01-02,451241,Swen,Thalberg,Europe,Germany,1966-01-12,190,1008,A. Datum Consumer Digital Camera M300 Orange,Cameras and camcorders,Digital Cameras,1,202.950,EUR,0.89342,181.319589
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
124333,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,1982-02-02,999999,1651,Contoso DVD 9-Inch Player Portable M300 Silver,"Music, Movies and Audio Books",Movie DVD,7,159.990,EUR,0.93870,914.612113
124334,3398034,2024-04-20,664396,Karlotta,Rivière,Europe,France,1982-02-02,999999,1646,Contoso DVD 9-Inch Player Portable M300 Black,"Music, Movies and Audio Books",Movie DVD,1,159.990,EUR,0.93870,150.182613
124335,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,1937-11-18,999999,1575,SV DVD Player M140 Gold,"Music, Movies and Audio Books",Movie DVD,2,60.990,CAD,1.37670,147.778282
124336,3398035,2024-04-20,267690,Michael,Wilson,North America,Canada,1937-11-18,999999,415,Proseware Laptop8.9 E089 White,Computers,Laptops,5,326.000,CAD,1.37670,2019.618900
