<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/2_Date_Calculations.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Calculations

## Overview

### 🥅 Analysis Goals

Explore sales data using various PostgreSQL functions to derive insights about sales trends, categories, and processing times.

- Summarize sales data by time dimensions (e.g., year, month, day).
- Analyze sales by product categories.
- Understand order processing times and their trends over time.

### 📘 Concepts Covered

Date Calculations: 
- `DATE_PART()`
- `INTERVAL`
- `AGE()`
- `CURRENT_DATE()`

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

### 💡 Note

**We may delete this note if we delete the date dimension table**

You may notice this specific database actually has a **date dimensions** table which is a static table that has one row per day, with other date attributes like day of the week, month name, etc. So you could join a table to this table to get the month or year. 

We **won't** be using this because not every database you'll work with has this. Also, it's important to understand how to calculate dates for different types of analysis (as you'll see). 

---
## DATE_PART

### 📝 Notes

`DATE_PART`
- `DATE_PART()` extracts specific components (e.g., year, month, day) from a date or timestamp.
- Syntax: `DATE_PART('unit', source)` where `unit` can be `'year'`, `'month'`, `'day'`, etc.
- Example: `DATE_PART('year', orderdate)` extracts the year from the `orderdate`.

### 💻 Final Result

- The queries return aggregated sales amounts grouped by specific time components, such as year, month, and day.

#### Extract Date Components and Aggregate Sales

**`DATE_PART`**

1. Use `DATE_PART` to get year, month, and day of the sales and also return the total sales amount.
    - Extract the `year`, `month`, and `day` from `orderdate` using `DATE_PART`.
    - Calculate the total sales amount using `SUM(quantity * netprice * exchangerate)`.
    - Group the data by the extracted components and order by `year`, `month`, and `day`.

In [2]:
%%sql

SELECT
    DATE_PART('year', s.orderdate) AS sales_year,
    DATE_PART('month', s.orderdate) AS sales_month,
    DATE_PART('day', s.orderdate) AS sales_day,
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_sale_amount
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    sales_year, sales_month, sales_day
ORDER BY
    sales_year, sales_month, sales_day;

Unnamed: 0,sales_year,sales_month,sales_day,total_sale_amount
0,2015.0,1.0,1.0,11640.795090
1,2015.0,1.0,2.0,5890.400052
2,2015.0,1.0,3.0,19796.672330
3,2015.0,1.0,5.0,12406.268768
4,2015.0,1.0,6.0,10349.869751
...,...,...,...,...
3289,2024.0,4.0,16.0,25098.988078
3290,2024.0,4.0,17.0,32938.671651
3291,2024.0,4.0,18.0,28408.756220
3292,2024.0,4.0,19.0,48386.883965


2. Summarize total sales by year:
    - Apply `DATE_PART('year', orderdate)` to extract the year.
    - Use `SUM(quantity * netprice * exchangerate)` to compute the total sales amount.
    - Group the data by `order_year` and order the results.

In [3]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS total_sale_amount -- Added
FROM sales s
GROUP BY -- Added
	order_year
ORDER BY -- Added
	order_year

Unnamed: 0,order_year,total_sale_amount
0,2015.0,7370979.0
1,2016.0,10383610.0
2,2017.0,13221340.0
3,2018.0,24667450.0
4,2019.0,31818100.0
5,2020.0,11218440.0
6,2021.0,21357980.0
7,2022.0,44864560.0
8,2023.0,33108570.0
9,2024.0,8396527.0


**📊[Insert chart]📊**

3. Add category-level granularity to the yearly sales summary:

    - Include `categoryname` in the `SELECT` clause.
    - Aggregate total sales by `order_year` and `categoryname`.
    - Group the data by these two columns and order by both.

In [4]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname -- Added
ORDER BY
	order_year,
    p.categoryname -- Added

Unnamed: 0,order_year,categoryname,total_sale_amount
0,2015.0,Audio,1.708722e+05
1,2015.0,Cameras and camcorders,1.828112e+06
2,2015.0,Cell phones,5.915135e+05
3,2015.0,Computers,2.139916e+06
4,2015.0,Games and Toys,4.540459e+04
...,...,...,...
75,2024.0,Computers,2.957040e+06
76,2024.0,Games and Toys,8.586775e+04
77,2024.0,Home Appliances,1.320161e+06
78,2024.0,"Music, Movies and Audio Books",5.926621e+05


**📊[Insert chart]📊**

Date Validation: Check unique `categoryname`.

In [5]:
%%sql

SELECT DISTINCT categoryname
FROM product
ORDER BY categoryname

Unnamed: 0,categoryname
0,Audio
1,Cameras and camcorders
2,Cell phones
3,Computers
4,Games and Toys
5,Home Appliances
6,"Music, Movies and Audio Books"
7,TV and Video


4. Pivot the table using `CASE WHEN`:

    - Use `CASE WHEN` to create a pivoted table with sales aggregated by `categoryname` for each `order_year`.
    - Aggregate sales for each category using `SUM` and conditional logic in `CASE WHEN`.
    - Group by `order_year` and order the results.

In [6]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    -- Added
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.netprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.netprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.netprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.netprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.netprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.netprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year
ORDER BY
	order_year

Unnamed: 0,order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2015.0,170872.152439,,591513.5,2139916.0,45404.589478,1380876.0,238806.2,975480.1
1,2016.0,335737.839462,,1080603.0,4271649.0,44802.520777,1876344.0,269915.5,968104.7
2,2017.0,478188.730147,,1509770.0,6731561.0,63097.550272,1877373.0,371268.6,1071710.0
3,2018.0,970257.627047,,3421484.0,12579930.0,212461.964511,2663526.0,854097.0,1442588.0
4,2019.0,930937.956146,,4459201.0,17419400.0,336060.564282,2107711.0,1175281.0,1625448.0
5,2020.0,368886.607852,,1882507.0,5106278.0,139271.172898,747590.4,679961.6,994522.5
6,2021.0,393160.157428,,3871630.0,9900175.0,155105.753822,2101225.0,1236253.0,2250755.0
7,2022.0,766938.211671,,8119665.0,17862210.0,316127.304417,6612447.0,2989297.0,5815337.0
8,2023.0,688690.184983,,6002148.0,11650870.0,270374.964658,5919993.0,2180768.0,4412178.0
9,2024.0,209228.635203,,1685745.0,2957040.0,85867.749572,1320161.0,592662.1,910738.5


**📊[Insert chart]📊**

---
## CURRENT_DATE, INTERVAL

### 📝 Notes

`CURRENT_DATE`

- **CURRENT_DATE** retrieves the current date based on the system's time zone.
- Returns a **DATE** type with no time component (e.g., `2024-12-04`).

`INTERVAL`

- **INTERVAL** represents a span of time, such as days, months, hours, or seconds.
- Used in date calculations (e.g., `CURRENT_DATE + INTERVAL '1 month'` adds one month to the current date).

**Note:** Similar to `CURRENT_DATE` there's also `NOW` which gets the current date *and* time. 

### 💻 Final Result

- Restrict results to the last 5 years of sales, excluding the current year.

#### Filter Data by Time Intervals**

**`INTERVAL`** and **`CURRENT_DATE`**

1. Use the last query to only return orders within the last 5 years of the current date.
    - Add `CURRENT_DATE - INTERVAL '5 years'` in the `WHERE` clause to filter records.
    - Use `CASE WHEN` for category-based aggregation in the `SELECT` clause.
    - Group data by `order_year` and order the results.

In [7]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.netprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.netprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.netprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.netprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.netprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.netprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	order_year
ORDER BY
	order_year

Unnamed: 0,order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019.0,54339.337385,,285278.1,1003903.0,22547.893849,74622.34,79393.3,102034.9
1,2020.0,368886.607852,,1882507.0,5106278.0,139271.172898,747590.4,679961.6,994522.5
2,2021.0,393160.157428,,3871630.0,9900175.0,155105.753822,2101225.0,1236253.0,2250755.0
3,2022.0,766938.211671,,8119665.0,17862210.0,316127.304417,6612447.0,2989297.0,5815337.0
4,2023.0,688690.184983,,6002148.0,11650870.0,270374.964658,5919993.0,2180768.0,4412178.0
5,2024.0,209228.635203,,1685745.0,2957040.0,85867.749572,1320161.0,592662.1,910738.5


2. Validate data by replacing `order_year` with `orderdate`:

    - Replace `DATE_PART('year', orderdate)` with `orderdate` in the `SELECT` clause.
    - Use the same `WHERE` clause and group the data by `orderdate`.

In [8]:
%%sql

SELECT 
	s.orderdate, -- Added
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.netprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.netprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.netprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.netprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.netprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.netprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	s.orderdate -- Added
ORDER BY
	s.orderdate -- Added

Unnamed: 0,orderdate,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019-12-13,2147.663757,,11053.224631,64223.087918,588.445988,4815.439219,2175.099082,8153.865000
1,2019-12-14,8356.890651,,16056.881745,78234.928835,1191.336352,1951.114882,4168.487056,2955.976002
2,2019-12-15,614.795000,,320.160000,9862.643840,,,529.172109,3112.912361
3,2019-12-16,1830.766135,,27706.154616,51489.108098,498.495771,245.573371,1693.184323,2237.711868
4,2019-12-17,1688.546857,,6660.419499,31940.157294,378.347570,1954.429855,4776.978058,1741.242571
...,...,...,...,...,...,...,...,...,...
1546,2024-04-16,80.240400,,4764.527997,13955.694468,23.225005,,2602.206209,
1547,2024-04-17,1197.230304,,11375.079480,10244.207866,570.356529,4348.056130,2300.955343,1880.060000
1548,2024-04-18,787.040981,,8250.558060,9255.466849,329.667720,3577.599858,2746.278705,1229.479045
1549,2024-04-19,,,10970.074389,20581.125791,207.856750,6191.428682,2010.860008,2756.544840


3. Use `DATE_TRUNC` to calculate `last_5_year` and `current_date_year`:

    - Add `DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years'` to find the start date.
    - Subtract `INTERVAL '1 day'` from `DATE_TRUNC('year', CURRENT_DATE)` to find the end date.
    - Include these calculated dates in the `SELECT` clause for validation.


 💡 Note

You could just add in the `WHERE` clause: 
```sql
s.orderdate::date BETWEEN '2019-01-01' AND '2023-12-01'
```
But it doesn't update dynamically and you'd have to remember to update it. So it's better to use something automatic rather than hard coded in.

In [9]:
%%sql

SELECT 
	s.orderdate,
    DATE_TRUNC('year', s.orderdate) AS order_year, -- Added
	DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AS start_date, -- Added
	DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day' AS end_date, -- Added
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.netprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.netprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.netprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.netprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.netprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.netprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE 
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	s.orderdate
ORDER BY
	s.orderdate

Unnamed: 0,orderdate,order_year,start_date,end_date,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019-12-13,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,2147.663757,,11053.224631,64223.087918,588.445988,4815.439219,2175.099082,8153.865000
1,2019-12-14,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,8356.890651,,16056.881745,78234.928835,1191.336352,1951.114882,4168.487056,2955.976002
2,2019-12-15,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,614.795000,,320.160000,9862.643840,,,529.172109,3112.912361
3,2019-12-16,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,1830.766135,,27706.154616,51489.108098,498.495771,245.573371,1693.184323,2237.711868
4,2019-12-17,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,1688.546857,,6660.419499,31940.157294,378.347570,1954.429855,4776.978058,1741.242571
...,...,...,...,...,...,...,...,...,...,...,...,...
1546,2024-04-16,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,80.240400,,4764.527997,13955.694468,23.225005,,2602.206209,
1547,2024-04-17,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,1197.230304,,11375.079480,10244.207866,570.356529,4348.056130,2300.955343,1880.060000
1548,2024-04-18,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,787.040981,,8250.558060,9255.466849,329.667720,3577.599858,2746.278705,1229.479045
1549,2024-04-19,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,,,10970.074389,20581.125791,207.856750,6191.428682,2010.860008,2756.544840


5. Refine the `WHERE` clause to exclude partial years:

    - Replace `orderdate` with `order_year` in the `SELECT` clause.
    - Use the calculated `last_5_year` and `current_date_year` in the `WHERE` clause to filter complete years.
    - Group by `order_year` and order the results.

In [10]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.netprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.netprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.netprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.netprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.netprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.netprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.netprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY
	order_year
ORDER BY
	order_year

Unnamed: 0,order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019.0,930937.956146,,4459201.0,17419400.0,336060.564282,2107711.0,1175281.0,1625448.0
1,2020.0,368886.607852,,1882507.0,5106278.0,139271.172898,747590.4,679961.6,994522.5
2,2021.0,393160.157428,,3871630.0,9900175.0,155105.753822,2101225.0,1236253.0,2250755.0
3,2022.0,766938.211671,,8119665.0,17862210.0,316127.304417,6612447.0,2989297.0,5815337.0
4,2023.0,688690.184983,,6002148.0,11650870.0,270374.964658,5919993.0,2180768.0,4412178.0


---
## AGE

### 📝 Notes

`AGE()`

- **AGE()** calculates the interval between two dates or timestamps.
- Returns a human-readable interval (e.g., `1 year 2 mons 3 days`) when passed two arguments or the difference from the current timestamp if given one.
- Example: `AGE(deliverydate, orderdate)` gives the processing time.

### 💻 Final Result

- Compute average processing times and total sales, aggregated by time periods.

#### Calculate Processing Time

**`AGE`**

1. Calculate the difference in time between the delivery date and order date using `AGE`:
    - Use `AGE(deliverydate, orderdate)` to compute the processing time for each order.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.

In [11]:
%%sql

SELECT 
    s.orderdate,
    AGE(s.deliverydate, s.orderdate) AS processing_time,
    s.quantity * s.netprice * s.exchangerate AS total_sale
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
ORDER BY 
    s.orderdate;

Unnamed: 0,orderdate,processing_time,total_sale
0,2019-01-01,0 days,1510.865988
1,2019-01-01,0 days,259.248182
2,2019-01-01,0 days,414.863467
3,2019-01-01,0 days,728.382240
4,2019-01-01,0 days,8.991000
...,...,...,...
141333,2023-12-31,4 days,217.881260
141334,2023-12-31,0 days,21.390000
141335,2023-12-31,0 days,773.520000
141336,2023-12-31,0 days,259.960000


2. Extract the DAY from the difference between delivery date and order date:

    - Use `EXTRACT(DAY FROM AGE(deliverydate, orderdate))` to extract the day component.
    - Display the `orderdate` as Month-Year using `TO_CHAR(orderdate, 'MM-YYYY')`.

In [12]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate)) AS processing_time, -- Update
    s.quantity * s.netprice * s.exchangerate AS total_sale
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
ORDER BY 
    order_month;

Unnamed: 0,order_month,processing_time,total_sale
0,01-2019,4,458.172000
1,01-2019,4,948.000000
2,01-2019,4,894.000000
3,01-2019,0,59.750977
4,01-2019,0,1073.184768
...,...,...,...
141333,12-2023,3,439.636730
141334,12-2023,3,2558.685769
141335,12-2023,0,31.330385
141336,12-2023,0,347.276490


3. Aggregate data by month to get total sales and average processing time:

    - Calculate the average processing time using `AVG(EXTRACT(DAY FROM AGE(...)))`.
    - Compute the total sales using `SUM(quantity * netprice * exchangerate)`.
    - Group by `TO_CHAR(orderdate, 'MM-YYYY')` and order the results.

In [13]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS avg_processing_time, -- Update
    SUM(s.quantity * s.netprice * s.exchangerate) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    s.orderdate
ORDER BY 
    order_month;

Unnamed: 0,order_month,avg_processing_time,total_sales
0,01-2019,0.86764705882352941176,67918.687936
1,01-2019,0.81034482758620689655,144771.304518
2,01-2019,0.64102564102564102564,97864.121241
3,01-2019,0.85714285714285714286,53042.968851
4,01-2019,0.44202898550724637681,223325.864347
...,...,...,...
1780,12-2023,1.7090909090909091,141981.336234
1781,12-2023,2.2714285714285714,63079.978397
1782,12-2023,1.7573529411764706,108227.994562
1783,12-2023,1.8079470198675497,136783.880582


4. Reformat results:

    - Use `ROUND()` to format the average processing time and total sales to two decimal places.

In [14]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time, -- Update
    ROUND(CAST(SUM(s.quantity * s.netprice * s.exchangerate) AS NUMERIC), 2) AS total_sales -- Update
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    order_month
ORDER BY 
    order_month;

Unnamed: 0,order_month,avg_processing_time,total_sales
0,01-2019,0.78,3082448.2
1,01-2020,1.01,2132132.93
2,01-2021,0.97,669787.93
3,01-2022,1.46,3647525.92
4,01-2023,1.69,3664431.34
5,02-2019,0.73,3870554.25
6,02-2020,0.8,2713593.19
7,02-2021,1.12,1094980.88
8,02-2022,1.53,4840124.87
9,02-2023,1.73,4465204.57


  5. Look at the yearly data.  
     - Replace monthly grouping with yearly grouping by changing `TO_CHAR(orderdate, 'MM-YYYY')` to `DATE_PART('year', orderdate)`.
     - Group data by `order_year` and order the results.

In [15]:
%%sql

SELECT 
    DATE_PART('year', s.orderdate) AS order_year, -- Update
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time,
    ROUND(CAST(SUM(s.quantity * s.netprice * s.exchangerate) AS NUMERIC), 2) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    order_year -- Update
ORDER BY 
    order_year; -- Update

Unnamed: 0,order_year,avg_processing_time,total_sales
0,2019.0,0.81,31818095.97
1,2020.0,0.93,11218435.79
2,2021.0,1.36,21357976.66
3,2022.0,1.62,44864557.21
4,2023.0,1.75,33108565.51
