<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/2_Date_Calculations.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Calculations

## Overview

### 🥅 Analysis Goals

Explore sales data using various PostgreSQL functions to derive insights about sales trends, categories, and processing times.

- Summarize sales data by time dimensions (e.g., year, month, day).
- Analyze sales by product categories.
- Understand order processing times and their trends over time.

### 📘 Concepts Covered

Date Calculations: 
- `DATE_PART()`
- `INTERVAL`
- `AGE()`
- `CURRENT_DATE()`

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

In [2]:
%config SqlMagic.named_parameters = "disabled"

### 💡 Note

You may notice this specific database actually has a **date dimensions** table which is a static table that has one row per day, with other date attributes like day of the week, month name, etc. So you could join a table to this table to get the month or year. 

We **won't** be using this because not every database you'll work with has this. Also, it's important to understand how to calculate dates for different types of analysis (as you'll see). 

---
## DATE_PART

### 📝 Notes

`DATE_PART`
- `DATE_PART()` extracts specific components (e.g., year, month, day) from a date or timestamp.
- Syntax: `DATE_PART('unit', source)` where `unit` can be `'year'`, `'month'`, `'day'`, etc.
- Example: `DATE_PART('year', orderdate)` extracts the year from the `orderdate`.

### 💻 Final Result

- The queries return aggregated sales amounts grouped by specific time components, such as year, month, and day.

#### Extract Date Components and Aggregate Sales

**`DATE_PART`**

1. Use `DATE_PART` to get year, month, and day of the sales and also return the total sales amount.
    - Extract the `year`, `month`, and `day` from `orderdate` using `DATE_PART`.
    - Calculate the total sales amount using `SUM(quantity * price * exchangerate)`.
    - Group the data by the extracted components and order by `year`, `month`, and `day`.

In [3]:
%%sql

SELECT
    DATE_PART('year', s.orderdate) AS sales_year,
    DATE_PART('month', s.orderdate) AS sales_month,
    DATE_PART('day', s.orderdate) AS sales_day,
    SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    sales_year, sales_month, sales_day
ORDER BY
    sales_year, sales_month, sales_day;

sales_year,sales_month,sales_day,total_sale_amount
2015.0,1.0,1.0,9783.814592299996
2015.0,1.0,2.0,6325.610072799998
2015.0,1.0,3.0,16054.5641264
2015.0,1.0,5.0,15808.9952614
2015.0,1.0,6.0,9247.1701588
2015.0,1.0,7.0,8046.3929002999985
2015.0,1.0,8.0,10152.908884699998
2015.0,1.0,9.0,9090.357786
2015.0,1.0,10.0,32381.971493900008
2015.0,1.0,12.0,11425.50091


2. Summarize total sales by year:
    - Apply `DATE_PART('year', orderdate)` to extract the year.
    - Use `SUM(quantity * price * exchangerate)` to compute the total sales amount.
    - Group the data by `order_year` and order the results.

In [4]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
	SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year
ORDER BY
	order_year 

order_year,total_sale_amount
2015.0,6474557.759217918
2016.0,8446942.005429365
2017.0,10156792.19404032
2018.0,18684554.280121468
2019.0,22960348.68668177
2020.0,9467853.57250547
2021.0,18005319.122038186
2022.0,43053017.75389994
2023.0,35220601.91826126
2024.0,8930345.807872463


**📊[Insert chart]📊**

3. Add category-level granularity to the yearly sales summary:

    - Include `categoryname` in the `SELECT` clause.
    - Aggregate total sales by `order_year` and `categoryname`.
    - Group the data by these two columns and order by both.

In [5]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname,
	SUM(s.quantity * p.price * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname
ORDER BY
	order_year,
    p.categoryname

order_year,categoryname,total_sale_amount
2015.0,Audio,242134.78374230015
2015.0,Cameras and camcorders,1213522.097557999
2015.0,Cell phones,350021.01758069964
2015.0,Computers,914596.0108028998
2015.0,Games and Toys,69030.82675066004
2015.0,Home Appliances,1965292.5475498936
2015.0,"Music, Movies and Audio Books",423904.3542101002
2015.0,TV and Video,1296056.1210232975
2016.0,Audio,474556.3549016001
2016.0,Cameras and camcorders,1022384.0719222992


**📊[Insert chart]📊**

Date Validation: Check unique `categoryname`.

In [6]:
%%sql

SELECT DISTINCT categoryname
FROM product
ORDER BY categoryname

categoryname
Audio
Cameras and camcorders
Cell phones
Computers
Games and Toys
Home Appliances
"Music, Movies and Audio Books"
TV and Video


4. Pivot the table using `CASE WHEN`:

    - Use `CASE WHEN` to create a pivoted table with sales aggregated by `categoryname` for each `order_year`.
    - Aggregate sales for each category using `SUM` and conditional logic in `CASE WHEN`.
    - Group by `order_year` and order the results.

In [7]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year
ORDER BY
	order_year

order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2015.0,242134.7837423001,,350021.0175806998,914596.0108029,69030.82675066002,1965292.547549895,423904.3542101005,1296056.1210232973
2016.0,474556.35490160016,,639555.7652443002,1824382.573265099,67743.01287371997,2657752.771042596,477677.8342107007,1282889.621969
2017.0,647303.3587497997,,956347.9370349993,3264137.416167098,81925.15551913998,2522514.8343053963,509706.0527850004,1336545.0930919996
2018.0,1259160.6191736977,,2272078.3249412943,6676094.60161419,251042.38349451963,3451429.036185488,1007644.6585953988,1706851.1077122986
2019.0,1204577.054108197,,2968422.6959140887,9266049.028753594,395377.63769760064,2739905.275997993,1389483.5181648948,1913169.736860199
2020.0,437153.7938611001,,1428953.551096798,3609346.787558197,164823.89083539986,888229.4841838991,556880.4521838007,1123875.0577427992
2021.0,464063.8424287003,,2942260.0356134884,7009486.872581305,183412.04189759985,2471605.9795224946,1010204.6870789962,2519519.667405494
2022.0,854127.3322440995,,7342863.472145046,15548062.12997004,351464.63046580134,7374114.849039229,2814693.7392864604,6338489.86081101
2023.0,730647.8724822998,,6383097.762667845,12373767.735130323,286481.6953874805,6317839.183700321,2321667.2394959824,4699134.796674995
2024.0,221823.9750647,,1791853.099653296,3138911.681083798,91062.65195280002,1405236.5520047988,628772.8820492001,978566.6537935


**📊[Insert chart]📊**

---
## CURRENT_DATE, INTERVAL

### 📝 Notes

`CURRENT_DATE`

- **CURRENT_DATE** retrieves the current date based on the system's time zone.
- Returns a **DATE** type with no time component (e.g., `2024-12-04`).

`INTERVAL`

- **INTERVAL** represents a span of time, such as days, months, hours, or seconds.
- Used in date calculations (e.g., `CURRENT_DATE + INTERVAL '1 month'` adds one month to the current date).

**Note:** Similar to `CURRENT_DATE` there's also `NOW` which gets the current date *and* time. 

### 💻 Final Result

- Restrict results to the last 5 years of sales, excluding the current year.

#### Filter Data by Time Intervals**

**`INTERVAL`** and **`CURRENT_DATE`**

1. Use the last query to only return orders within the last 5 years of the current date.
    - Add `CURRENT_DATE - INTERVAL '5 years'` in the `WHERE` clause to filter records.
    - Use `CASE WHEN` for category-based aggregation in the `SELECT` clause.
    - Group data by `order_year` and order the results.

In [8]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	order_year
ORDER BY
	order_year

order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2019.0,93023.67417300002,,232616.8260824001,694041.1291942004,35778.440980099986,148004.61860339998,124683.0201444,139509.1341115
2020.0,437153.7938611001,,1428953.551096797,3609346.7875581975,164823.89083539983,888229.4841838992,556880.4521838003,1123875.0577427996
2021.0,464063.8424287003,,2942260.035613489,7009486.872581306,183412.04189759988,2471605.9795224955,1010204.6870789966,2519519.6674054936
2022.0,854127.3322440994,,7342863.472145041,15548062.12997003,351464.63046580134,7374114.8490392305,2814693.73928646,6338489.860811012
2023.0,730647.8724822998,,6383097.762667844,12373767.735130323,286481.6953874804,6317839.183700318,2321667.239495983,4699134.796674995
2024.0,221823.9750647,,1791853.0996532957,3138911.6810837984,91062.65195280002,1405236.552004799,628772.8820492002,978566.6537935


2. Validate data by replacing `order_year` with `orderdate`:

    - Replace `DATE_PART('year', orderdate)` with `orderdate` in the `SELECT` clause.
    - Use the same `WHERE` clause and group the data by `orderdate`.

In [None]:
%%sql

SELECT 
	s.orderdate,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	s.orderdate
ORDER BY
	s.orderdate

orderdate,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2019-12-05,3064.38431,,8522.664348,22139.83404,1176.643633,6608.186075,5285.8283886,1711.85396
2019-12-06,4745.4510793,,3876.1264305,20968.79738,693.748325,12973.888283199998,4254.536214,2313.01931
2019-12-07,4855.4344715,,4646.755657199999,35124.720049,1630.1393156,6736.274874999999,8631.9514826,7271.5053722
2019-12-08,59.17664249999999,,702.0,1106.1873,86.0,,109.99,
2019-12-09,3180.4738376,,3107.8247288,11045.321078,848.3569279999999,6019.8837535,2264.381977,2661.28521
2019-12-10,953.310101,,3082.39599,10695.399984999998,1069.88,5480.6541223,2273.1742969,714.0
2019-12-11,2274.46724,,7380.3870251,37623.076852,2483.8510184,7945.889999999999,1933.9863712,2716.74
2019-12-12,3972.5982184,,9805.2914926,21954.455461,982.2908470999998,5283.1559187,5395.1303297,2199.67399
2019-12-13,2744.2643678000004,,7418.913762099999,34376.7007505,692.6983769,6580.4868521,2582.8730888,9059.85
2019-12-14,10898.9820784,,10660.0680552,42015.5809585,1407.5394067000002,2422.8064921000005,5060.0429626000005,3669.762


3. Use `DATE_TRUNC` to calculate `last_5_year` and `current_date_year`:

    - Add `DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years'` to find the start date.
    - Subtract `INTERVAL '1 day'` from `DATE_TRUNC('year', CURRENT_DATE)` to find the end date.
    - Include these calculated dates in the `SELECT` clause for validation.


 💡 Note

You could just add in the `WHERE` clause: 
```sql
s.orderdate::date BETWEEN '2019-01-01' AND '2023-12-01'
```
But it doesn't update dynamically and you'd have to remember to update it. So it's better to use something automatic rather than hard coded in.

In [None]:
%%sql

SELECT 
	s.orderdate,
    DATE_TRUNC('year', s.orderdate) AS order_year,
	DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AS start_date,
	DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day' AS end_date,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	s.orderdate
ORDER BY
	s.orderdate

orderdate,order_year,last_5_year,current_date_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2019-12-05,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,3064.38431,,8522.664348,22139.83404,1176.643633,6608.186075,5285.8283886,1711.85396
2019-12-06,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,4745.4510793,,3876.1264305,20968.79738,693.748325,12973.888283199998,4254.536214,2313.01931
2019-12-07,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,4855.4344715,,4646.755657199999,35124.720049,1630.1393156,6736.274874999999,8631.9514826,7271.5053722
2019-12-08,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,59.17664249999999,,702.0,1106.1873,86.0,,109.99,
2019-12-09,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,3180.4738376,,3107.8247288,11045.321078,848.3569279999999,6019.8837535,2264.381977,2661.28521
2019-12-10,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,953.310101,,3082.39599,10695.399984999998,1069.88,5480.6541223,2273.1742969,714.0
2019-12-11,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,2274.46724,,7380.3870251,37623.076852,2483.8510184,7945.89,1933.9863712,2716.74
2019-12-12,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,3972.5982184,,9805.2914926,21954.455461,982.2908471,5283.1559187,5395.130329700001,2199.67399
2019-12-13,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,2744.2643678000004,,7418.913762100001,34376.7007505,692.6983769000001,6580.4868521,2582.8730888,9059.85
2019-12-14,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,10898.9820784,,10660.0680552,42015.5809585,1407.5394067000002,2422.8064921000005,5060.0429626000005,3669.762


5. Refine the `WHERE` clause to exclude partial years:

    - Replace `orderdate` with `order_year` in the `SELECT` clause.
    - Use the calculated `last_5_year` and `current_date_year` in the `WHERE` clause to filter complete years.
    - Group by `order_year` and order the results.

In [40]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * p.price * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * p.price * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * p.price * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * p.price * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * p.price * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * p.price * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * p.price * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * p.price * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY
	order_year
ORDER BY
	order_year

order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
2019.0,1204577.0541081978,,2968422.69591409,9266049.028753605,395377.6376976005,2739905.275997993,1389483.518164894,1913169.736860199
2020.0,437153.7938611,,1428953.5510967972,3609346.787558196,164823.8908353998,888229.4841838992,556880.4521838006,1123875.0577427994
2021.0,464063.8424287003,,2942260.035613489,7009486.872581306,183412.04189759976,2471605.979522495,1010204.687078997,2519519.6674054936
2022.0,854127.3322440994,,7342863.47214504,15548062.129970038,351464.63046580146,7374114.8490392305,2814693.739286459,6338489.86081101
2023.0,730647.8724822998,,6383097.762667842,12373767.735130323,286481.69538748043,6317839.183700319,2321667.239495984,4699134.796674993


---
## AGE

### 📝 Notes

`AGE()`

- **AGE()** calculates the interval between two dates or timestamps.
- Returns a human-readable interval (e.g., `1 year 2 mons 3 days`) when passed two arguments or the difference from the current timestamp if given one.
- Example: `AGE(deliverydate, orderdate)` gives the processing time.

### 💻 Final Result

- Compute average processing times and total sales, aggregated by time periods.

#### Calculate Processing Time

**`AGE`**

1. Calculate the difference in time between the delivery date and order date using `AGE`:
    - Use `AGE(deliverydate, orderdate)` to compute the processing time for each order.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.

In [None]:
%%sql

SELECT 
    s.orderdate,
    AGE(s.deliverydate, s.orderdate) AS processing_time,
    s.quantity * p.price * s.exchangerate AS total_sale
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
ORDER BY 
    s.orderdate;

orderdate,processing_time,total_sale
2019-12-05,"3 days, 0:00:00",155.98
2019-12-05,"3 days, 0:00:00",44.95
2019-12-05,"3 days, 0:00:00",439.9
2019-12-05,"3 days, 0:00:00",8.88
2019-12-05,"3 days, 0:00:00",4661.58
2019-12-05,0:00:00,800.0
2019-12-05,0:00:00,2214.0
2019-12-05,0:00:00,208.0
2019-12-05,0:00:00,792.0
2019-12-05,"6 days, 0:00:00",339.76794


2. Extract the DAY from the difference between delivery date and order date:

    - Use `EXTRACT(DAY FROM AGE(deliverydate, orderdate))` to extract the day component.
    - Group the results by `TO_CHAR(orderdate, 'MM-YYYY')` to aggregate by month.

In [41]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate)) AS processing_time,
    s.quantity * p.price * s.exchangerate AS total_sale
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
ORDER BY 
    order_month;

order_month,processing_time,total_sale
01-2019,0,199.0
01-2019,2,3339.7859000000003
01-2019,0,11.1561186
01-2019,0,1189.5541911
01-2019,0,224.70855
01-2019,3,579.98
01-2019,3,57.98
01-2019,3,113.85
01-2019,3,186.9
01-2019,0,8.82


3. Aggregate data by month to get total sales and average processing time:

    - Calculate the average processing time using `AVG(EXTRACT(DAY FROM AGE(...)))`.
    - Compute the total sales using `SUM(quantity * price * exchangerate)`.
    - Group by `TO_CHAR(orderdate, 'MM-YYYY')` and order the results.

In [42]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS avg_processing_time,
    SUM(s.quantity * p.price * s.exchangerate) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    s.orderdate
ORDER BY 
    order_month;

order_month,avg_processing_time,total_sales
01-2019,0.2447552447552447,155774.2361941
01-2019,0.6391752577319587,80340.77782979998
01-2019,1.3786407766990292,67340.8297225
01-2019,0.5984848484848484,112417.18750730006
01-2019,1.2150537634408602,78671.70241649999
01-2019,0.8983050847457626,79338.89994819999
01-2019,0.1333333333333333,45924.9663329
01-2019,1.2876712328767124,59178.346943199984
01-2019,2.5454545454545454,4870.3621568
01-2019,1.1395348837209303,64669.2075083


4. Reformat results:

    - Use `ROUND()` to format the average processing time and total sales to two decimal places.

In [43]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time,
    ROUND(CAST(SUM(s.quantity * p.price * s.exchangerate) AS NUMERIC), 2) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    order_month
ORDER BY 
    order_month;

order_month,avg_processing_time,total_sales
01-2019,0.78,2230970.39
01-2020,1.01,1816558.17
01-2021,0.97,550452.33
01-2022,1.46,3145677.01
01-2023,1.69,3904637.1
02-2019,0.73,2803181.64
02-2020,0.8,2282961.66
02-2021,1.12,930737.51
02-2022,1.53,4170207.27
02-2023,1.73,4746764.05


  5. Look at the yearly data.  
    - Replace monthly grouping with yearly grouping by changing `TO_CHAR(orderdate, 'MM-YYYY')` to `DATE_PART('year', orderdate)`.
    - Group data by `order_year` and order the results.

In [44]:
%%sql

SELECT 
    DATE_PART('year', s.orderdate) AS order_year,
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time,
    ROUND(CAST(SUM(s.quantity * p.price * s.exchangerate) AS NUMERIC), 2) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    order_year
ORDER BY 
    order_year;

order_year,avg_processing_time,total_sales
2019.0,0.81,22960348.69
2020.0,0.93,9467853.57
2021.0,1.36,18005319.12
2022.0,1.62,43053017.75
2023.0,1.75,35220601.92
