# Table Overview

In [2]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

## Dataset Overview

Goal:
- Familiar with dataset
- Looking at the main tables
- Explain why we use exchangerate

## Tables

There are X tables in total but we'll be mainly working with the following:

1. Sales
2. Custoemrs
2. Product

Below is the EDR Diagram

**Insert EDR Diagram 📊**

### Sales

Overview of the sales table and the columns we'll use the most.

Net Revenue
- Definition: The total revenue after accounting for discounts, promotions, and adjustments. It's the actual price paid by customers. 
- Formula: `netprice` * `quantity`

Gross Revenue: 
- Definition: The total revenue (full price) without discounts. 
- Formula: `unitprice` * `quantity`

In [3]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    s.netprice,
    s.currencycode,
    s.exchangerate
FROM
    sales s
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,netprice,currencycode,exchangerate
0,1000,2015-01-01,947009,400,48,1,98.9670,GBP,0.64155
1,1000,2015-01-01,947009,400,460,1,659.7800,GBP,0.64155
2,1001,2015-01-01,1772036,430,1730,2,54.3760,USD,1.00000
3,1002,2015-01-01,1518349,660,955,4,286.6864,USD,1.00000
4,1002,2015-01-01,1518349,660,62,7,135.7500,USD,1.00000
...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,139.1913,EUR,0.93870
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,EUR,0.93870
199870,3398035,2024-04-20,267690,999999,1575,2,53.6712,CAD,1.37670
199871,3398035,2024-04-20,267690,999999,415,5,293.4000,CAD,1.37670


Add the calculation for the `net_revenue`.

We're multiplying `quantity * netprice * exchangerate`.

The `exchangerate` must be multiplied because not every sale is in USD currency, you can see which currency it is in the `currencycode` column.

In [4]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    s.unitprice,
    s.currencycode,
    s.exchangerate,
    s.quantity * s.netprice * s.exchangerate AS net_revenue -- Added
FROM
    sales s
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate,net_revenue
0,1000,2015-01-01,947009,400,48,1,112.4625,GBP,0.64155,63.492279
1,1000,2015-01-01,947009,400,460,1,749.7500,GBP,0.64155,423.281859
2,1001,2015-01-01,1772036,430,1730,2,54.3760,USD,1.00000,108.752000
3,1002,2015-01-01,1518349,660,955,4,315.0400,USD,1.00000,1146.745600
4,1002,2015-01-01,1518349,660,62,7,135.7500,USD,1.00000,950.250000
...,...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,159.9900,EUR,0.93870,914.612113
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,EUR,0.93870,150.182613
199870,3398035,2024-04-20,267690,999999,1575,2,60.9900,CAD,1.37670,147.778282
199871,3398035,2024-04-20,267690,999999,415,5,326.0000,CAD,1.37670,2019.618900


### Customers

In [5]:
%%sql

SELECT
    c.customerkey,
    c.continent,
    c.gender,
    c.givenname,
    c.surname,
    c.countryfull,
    c.birthday,
    c.company
FROM
    customer c
ORDER BY
    c.customerkey

Unnamed: 0,customerkey,continent,gender,givenname,surname,countryfull,birthday,company
0,15,Australia,male,Julian,McGuigan,Australia,1965-03-24,Cut Rite Lawn Care
1,23,Australia,female,Rose,Dash,Australia,1990-05-10,Rack N Sack
2,36,Australia,female,Annabelle,Townsend,Australia,1964-07-16,id Boutiques
3,120,Australia,male,Jamie,Hetherington,Australia,1946-12-11,Showbiz Pizza Place
4,180,Australia,male,Gabriel,Bosanquet,Australia,1955-04-24,Dubrow's Cafeteria
...,...,...,...,...,...,...,...,...
104985,2099639,North America,male,Miroslav,Slach,United States,1945-04-30,Strength Gurus
104986,2099656,North America,male,Wilfredo,Lozada,United States,1945-08-24,Williams Bros.
104987,2099697,North America,male,Phillipp,Maier,United States,1966-12-08,Excella
104988,2099711,North America,female,Katerina,Pavlícková,United States,1941-01-01,Lawnscape Garden Maintenance


### Product

In [6]:
%%sql

SELECT
    s.orderkey,
    s.orderdate,
    s.customerkey,
    s.storekey,
    s.productkey,
    s.quantity,
    s.unitprice,
    s.quantity * s.unitprice * s.exchangerate AS total_sale_amount
FROM
    sales s
ORDER BY
    s.orderkey

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,total_sale_amount
0,1000,2015-01-01,947009,400,48,1,112.4625,72.150317
1,1000,2015-01-01,947009,400,460,1,749.7500,481.002112
2,1001,2015-01-01,1772036,430,1730,2,54.3760,108.752000
3,1002,2015-01-01,1518349,660,955,4,315.0400,1260.160000
4,1002,2015-01-01,1518349,660,62,7,135.7500,950.250000
...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,159.9900,1051.278291
199869,3398034,2024-04-20,664396,999999,1646,1,159.9900,150.182613
199870,3398035,2024-04-20,267690,999999,1575,2,60.9900,167.929866
199871,3398035,2024-04-20,267690,999999,415,5,326.0000,2244.021000
