# Date Calculations

## Overview

### ü•Ö Analysis Goals

- What we‚Äôre going to use for this dataset to do X e.g. Use the following in order to explore a dataset on experience and salaries
    - Major topic 1
    - Major topic 2
    - Major topic 3
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### üìò Concepts Covered

Date Calculations: 
- `DATE_PART()`
- `INTERVAL`
- `AGE()`
- `CURRENT_DATE()`

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

In [2]:
%config SqlMagic.named_parameters = "disabled"

### üí° Note

You may notice this specific database actually has a **date dimensions** table which is a static table that has one row per day, with other date attributes like day of the week, month name, etc. So you could join a table to this table to get the month or year. 

We **won't** be using this because not every database you'll work with has this. Also, it's important to understand how to calculate dates for different types of analysis (as you'll see). 

---
## DATE_PART

### üìù¬†Notes

`DATE_PART`
- `DATE_PART` extracts a specific part of a date or timestamp (e.g., year, month, day, hour, minute).
- Syntax: `DATE_PART('part', timestamp)` (e.g., `DATE_PART('year', '2024-12-04 10:15:30')` returns `2024`).

### üíª¬†Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we‚Äôre going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

**Basic Query**

In [3]:
%%sql

SELECT
    DATE_PART('year', s.orderdate) AS sales_year,
    DATE_PART('month', s.orderdate) AS sales_month,
    DATE_PART('day', s.orderdate) AS sales_day,
    SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    sales_year, sales_month, sales_day
ORDER BY
    sales_year, sales_month, sales_day;

sales_year,sales_month,sales_day,total_sale_amount
2015.0,1.0,1.0,9783.814592299996
2015.0,1.0,2.0,6325.610072799998
2015.0,1.0,3.0,16054.5641264
2015.0,1.0,5.0,15808.9952614
2015.0,1.0,6.0,9247.1701588
2015.0,1.0,7.0,8046.3929002999985
2015.0,1.0,8.0,10152.908884699998
2015.0,1.0,9.0,9090.357786
2015.0,1.0,10.0,32381.971493900008
2015.0,1.0,12.0,11425.50091


**Advanced Query**

In [4]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
	SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year
ORDER BY
	order_year 

order_year,total_sale_amount
2015.0,6474557.759217918
2016.0,8446942.005429374
2017.0,10156792.19404032
2018.0,18684554.280121475
2019.0,22960348.68668175
2020.0,9467853.572505478
2021.0,18005319.122038182
2022.0,43053017.75389998
2023.0,35220601.918261275
2024.0,8930345.807872465


**üìä[Insert chart]üìä**

In [5]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname,
	SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname
ORDER BY
	order_year,
    p.categoryname

order_year,categoryname,total_sale_amount
2015.0,Audio,242134.78374230015
2015.0,Cameras and camcorders,1213522.097557999
2015.0,Cell phones,350021.01758069964
2015.0,Computers,914596.0108028998
2015.0,Games and Toys,69030.82675066004
2015.0,Home Appliances,1965292.5475498936
2015.0,"Music, Movies and Audio Books",423904.3542101002
2015.0,TV and Video,1296056.1210232975
2016.0,Audio,474556.3549016001
2016.0,Cameras and camcorders,1022384.0719222992


**üìä[Insert chart]üìä**

Check unique `categoryname`.

In [11]:
%%sql

SELECT DISTINCT categoryname
FROM product
ORDER BY categoryname

categoryname
Audio
Cameras and camcorders
Cell phones
Computers
Games and Toys
Home Appliances
"Music, Movies and Audio Books"
TV and Video


Pivot the table

In [6]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year
ORDER BY
	order_year

order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2015.0,242134.78374230023,,350021.01758069976,914596.0108029,69030.82675066002,1965292.547549894,423904.3542101003,1296056.1210232975
2016.0,474556.35490159993,,639555.7652443002,1824382.5732650997,67743.01287371997,2657752.7710425965,477677.8342107006,1282889.6219689995
2017.0,647303.3587497999,,956347.9370349988,3264137.416167098,81925.15551914001,2522514.8343053954,509706.0527850004,1336545.0930919996
2018.0,1259160.619173698,,2272078.3249412947,6676094.601614192,251042.38349451963,3451429.036185488,1007644.6585953984,1706851.1077122982
2019.0,1204577.054108197,,2968422.6959140897,9266049.028753595,395377.63769760064,2739905.2759979926,1389483.5181648943,1913169.736860199
2020.0,437153.7938611,,1428953.551096797,3609346.787558196,164823.8908353998,888229.4841838994,556880.4521838005,1123875.0577427994
2021.0,464063.8424287003,,2942260.0356134893,7009486.872581305,183412.04189759985,2471605.9795224937,1010204.6870789966,2519519.6674054945
2022.0,854127.3322440992,,7342863.472145045,15548062.129970027,351464.63046580134,7374114.8490392305,2814693.739286461,6338489.860811011
2023.0,730647.8724822997,,6383097.762667838,12373767.73513032,286481.6953874803,6317839.183700321,2321667.239495983,4699134.796674995
2024.0,221823.9750647,,1791853.0996532962,3138911.6810837984,91062.65195280002,1405236.5520047997,628772.8820492004,978566.6537935


**üìä[Insert chart]üìä**

---
## AGE, CURRENT_DATE

### üìù¬†Notes

`CURRENT_DATE`

- **CURRENT_DATE** retrieves the current date based on the system's time zone.
- Returns a **DATE** type with no time component (e.g., `2024-12-04`).

`AGE()`

- **AGE()** calculates the interval between two dates or timestamps.
- Returns a human-readable interval (e.g., `1 year 2 mons 3 days`) when passed two arguments or the difference from the current timestamp if given one.

### üíª¬†Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we‚Äôre going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

Using last query only return orders within the last 5 years of the current date.

In [7]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	order_year
ORDER BY
	order_year

order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2019.0,96098.65534230004,,242658.00395900017,730752.5927482005,36135.471403299984,159022.1506698,132509.33583789997,148988.8941115
2020.0,437153.7938611,,1428953.5510967977,3609346.787558196,164823.89083539974,888229.4841838992,556880.4521838006,1123875.0577427992
2021.0,464063.8424287003,,2942260.0356134893,7009486.872581303,183412.0418975999,2471605.9795224946,1010204.687078996,2519519.6674054945
2022.0,854127.3322440994,,7342863.472145045,15548062.129970036,351464.63046580134,7374114.849039231,2814693.73928646,6338489.86081101
2023.0,730647.8724822998,,6383097.762667842,12373767.735130329,286481.69538748043,6317839.183700319,2321667.2394959824,4699134.796674994
2024.0,221823.9750647,,1791853.0996532962,3138911.6810837984,91062.65195280004,1405236.5520047992,628772.8820492001,978566.6537935


---
## INTERVAL

### üìù¬†Notes

`INTERVAL`

- **INTERVAL** represents a span of time, such as days, months, hours, or seconds.
- Used in date calculations (e.g., `CURRENT_DATE + INTERVAL '1 month'` adds one month to the current date).

### üíª¬†Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we‚Äôre going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

In [None]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    AVG(EXTRACT(DAY FROM AGE(s.shipdate, s.orderdate))) AS avg_processing_time,
    SUM(s.quantity * p.price * s.exchangerate) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.shipdate IS NOT NULL
GROUP BY 
    TO_CHAR(DATE_TRUNC('month', s.orderdate), 'YYYY-MM')
ORDER BY 
    order_month;