<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/2_Date_Calculations.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Calculations

## Overview

### 🥅 Analysis Goals

Explore sales data using various PostgreSQL functions to derive insights about sales trends, categories, and processing times.

- Summarize sales data by time dimensions (e.g., year, month, day).
- Analyze sales by product categories.
- Understand order processing times and their trends over time.

### 📘 Concepts Covered

Date Calculations: 
- `DATE_PART()`
- `INTERVAL`
- `AGE()`
- `CURRENT_DATE()`

---

In [1]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

### 💡 Note

**We may delete this note if we delete the date dimension table**

You may notice this specific database actually has a **date dimensions** table which is a static table that has one row per day, with other date attributes like day of the week, month name, etc. So you could join a table to this table to get the month or year. 

We **won't** be using this because not every database you'll work with has this. Also, it's important to understand how to calculate dates for different types of analysis (as you'll see). 

---
## DATE_PART

### 📝 Notes

`DATE_PART`
- `DATE_PART()` extracts specific components (e.g., year, month, day) from a date or timestamp.
- Syntax: `DATE_PART('unit', source)` where `unit` can be `'year'`, `'month'`, `'day'`, etc.
- Example: `DATE_PART('year', orderdate)` extracts the year from the `orderdate`.

### 💻 Final Result

- The queries return aggregated sales amounts grouped by specific time components, such as year, month, and day.

#### Extract Date Components and Aggregate Sales

**`DATE_PART`**

1. Use `DATE_PART` to get year, month, and day of the sales and also return the total sales amount.
    - Extract the `year`, `month`, and `day` from `orderdate` using `DATE_PART`.
    - Calculate the total sales amount using `SUM(quantity * price * exchangerate)`.
    - Group the data by the extracted components and order by `year`, `month`, and `day`.

In [2]:
%%sql

SELECT
    DATE_PART('year', s.orderdate) AS sales_year,
    DATE_PART('month', s.orderdate) AS sales_month,
    DATE_PART('day', s.orderdate) AS sales_day,
    SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sale_amount
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    sales_year, sales_month, sales_day
ORDER BY
    sales_year, sales_month, sales_day;

Unnamed: 0,sales_year,sales_month,sales_day,total_sale_amount
0,2015.0,1.0,1.0,12459.476058
1,2015.0,1.0,2.0,6120.500503
2,2015.0,1.0,3.0,20542.707479
3,2015.0,1.0,5.0,13807.144083
4,2015.0,1.0,6.0,10685.510438
...,...,...,...,...
3289,2024.0,4.0,16.0,26733.117368
3290,2024.0,4.0,17.0,35495.834926
3291,2024.0,4.0,18.0,29994.065693
3292,2024.0,4.0,19.0,50233.576107


2. Summarize total sales by year:
    - Apply `DATE_PART('year', orderdate)` to extract the year.
    - Use `SUM(quantity * unitprice * exchangerate)` to compute the total sales amount.
    - Group the data by `order_year` and order the results.

In [3]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year, -- Added
	SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sale_amount -- Added
FROM sales s
GROUP BY -- Added
	order_year
ORDER BY -- Added
	order_year

Unnamed: 0,order_year,total_sale_amount
0,2015.0,7853243.0
1,2016.0,11057540.0
2,2017.0,14052600.0
3,2018.0,26197510.0
4,2019.0,33852650.0
5,2020.0,11953130.0
6,2021.0,22692620.0
7,2022.0,47708810.0
8,2023.0,35220600.0
9,2024.0,8930346.0


**📊[Insert chart]📊**

3. Add category-level granularity to the yearly sales summary:

    - Include `categoryname` in the `SELECT` clause.
    - Aggregate total sales by `order_year` and `categoryname`.
    - Group the data by these two columns and order by both.

In [4]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname, -- Added
	SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname -- Added
ORDER BY
	order_year,
    p.categoryname -- Added

Unnamed: 0,order_year,categoryname,total_sale_amount
0,2015.0,Audio,1.816011e+05
1,2015.0,Cameras and camcorders,1.941635e+06
2,2015.0,Cell phones,6.300378e+05
3,2015.0,Computers,2.286490e+06
4,2015.0,Games and Toys,4.832158e+04
...,...,...,...
75,2024.0,Computers,3.138912e+06
76,2024.0,Games and Toys,9.106265e+04
77,2024.0,Home Appliances,1.405237e+06
78,2024.0,"Music, Movies and Audio Books",6.287729e+05


**📊[Insert chart]📊**

Date Validation: Check unique `categoryname`.

In [5]:
%%sql

SELECT DISTINCT categoryname
FROM product
ORDER BY categoryname

Unnamed: 0,categoryname
0,Audio
1,Cameras and camcorders
2,Cell phones
3,Computers
4,Games and Toys
5,Home Appliances
6,"Music, Movies and Audio Books"
7,TV and Video


4. Pivot the table using `CASE WHEN`:

    - Use `CASE WHEN` to create a pivoted table with sales aggregated by `categoryname` for each `order_year`.
    - Aggregate sales for each category using `SUM` and conditional logic in `CASE WHEN`.
    - Group by `order_year` and order the results.

In [6]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    -- Added
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year
ORDER BY
	order_year

Unnamed: 0,order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2015.0,181601.1,,630037.8,2286490.0,48321.578725,1473969.0,254342.6,1036845.0
1,2016.0,355917.3,,1151200.0,4560956.0,47420.109012,1993315.0,286606.7,1026312.0
2,2017.0,511045.0,,1601941.0,7151336.0,66925.77569,1994130.0,393249.5,1138263.0
3,2018.0,1032512.0,,3635325.0,13352190.0,225938.145145,2830172.0,906880.2,1536166.0
4,2019.0,987753.2,,4749476.0,18532100.0,355839.873928,2246722.0,1250535.0,1721853.0
5,2020.0,392857.2,,2003646.0,5434553.0,148341.501752,797483.6,722763.1,1066044.0
6,2021.0,417657.5,,4119164.0,10514230.0,165070.837708,2224445.0,1313266.0,2393544.0
7,2022.0,814438.9,,8623592.0,19003050.0,335683.162409,7026622.0,3178853.0,6187337.0
8,2023.0,730647.9,,6383098.0,12373770.0,286481.695387,6317839.0,2321667.0,4699135.0
9,2024.0,221824.0,,1791853.0,3138912.0,91062.651953,1405237.0,628772.9,978566.7


**📊[Insert chart]📊**

---
## CURRENT_DATE, INTERVAL

### 📝 Notes

`CURRENT_DATE`

- **CURRENT_DATE** retrieves the current date based on the system's time zone.
- Returns a **DATE** type with no time component (e.g., `2024-12-04`).

`INTERVAL`

- **INTERVAL** represents a span of time, such as days, months, hours, or seconds.
- Used in date calculations (e.g., `CURRENT_DATE + INTERVAL '1 month'` adds one month to the current date).

**Note:** Similar to `CURRENT_DATE` there's also `NOW` which gets the current date *and* time. 

### 💻 Final Result

- Restrict results to the last 5 years of sales, excluding the current year.

#### Filter Data by Time Intervals**

**`INTERVAL`** and **`CURRENT_DATE`**

1. Use the last query to only return orders within the last 5 years of the current date.
    - Add `CURRENT_DATE - INTERVAL '5 years'` in the `WHERE` clause to filter records.
    - Use `CASE WHEN` for category-based aggregation in the `SELECT` clause.
    - Group data by `order_year` and order the results.

In [7]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	order_year
ORDER BY
	order_year

Unnamed: 0,order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019.0,60590.600723,,322077.9,1110676.0,25010.839584,83836.67,89936.25,109908.7
1,2020.0,392857.168851,,2003646.0,5434553.0,148341.501752,797483.6,722763.1,1066044.0
2,2021.0,417657.458186,,4119164.0,10514230.0,165070.837708,2224445.0,1313266.0,2393544.0
3,2022.0,814438.856577,,8623592.0,19003050.0,335683.162409,7026622.0,3178853.0,6187337.0
4,2023.0,730647.872482,,6383098.0,12373770.0,286481.695387,6317839.0,2321667.0,4699135.0
5,2024.0,221823.975065,,1791853.0,3138912.0,91062.651953,1405237.0,628772.9,978566.7


2. Validate data by replacing `order_year` with `orderdate`:

    - Replace `DATE_PART('year', orderdate)` with `orderdate` in the `SELECT` clause.
    - Use the same `WHERE` clause and group the data by `orderdate`.

In [8]:
%%sql

SELECT 
	s.orderdate, -- Added
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	s.orderdate -- Added
ORDER BY
	s.orderdate -- Added

Unnamed: 0,orderdate,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019-12-12,3257.530539,,15688.466388,43908.910922,884.061762,4332.187853,4855.617297,1979.706591
1,2019-12-13,2250.296782,,11870.262019,68753.401501,623.428539,5395.999219,2324.585780,8153.865000
2,2019-12-14,8937.165304,,17056.108888,84031.161917,1266.785466,1986.701324,4554.038666,3302.785800
3,2019-12-15,614.795000,,368.000000,10389.273840,,,529.172109,3497.639496
4,2019-12-16,1968.352497,,29759.935162,54797.737402,548.050041,284.216441,1785.187638,2237.711868
...,...,...,...,...,...,...,...,...,...
1547,2024-04-16,86.280000,,4999.379603,14913.560008,26.004353,,2691.693404,
1548,2024-04-17,1204.229604,,12290.068340,11588.548911,598.368377,4466.726385,2391.827309,1880.060000
1549,2024-04-18,798.277272,,8566.395123,9839.511960,339.401768,3919.755236,2837.238691,1339.703843
1550,2024-04-19,,,11369.710897,20964.289259,210.258031,6517.185497,2270.491344,2965.081900


3. Use `DATE_TRUNC` to calculate `last_5_year` and `current_date_year`:

    - Add `DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years'` to find the start date.
    - Subtract `INTERVAL '1 day'` from `DATE_TRUNC('year', CURRENT_DATE)` to find the end date.
    - Include these calculated dates in the `SELECT` clause for validation.


 💡 Note

You could just add in the `WHERE` clause: 
```sql
s.orderdate::date BETWEEN '2019-01-01' AND '2023-12-01'
```
But it doesn't update dynamically and you'd have to remember to update it. So it's better to use something automatic rather than hard coded in.

In [9]:
%%sql

SELECT 
	s.orderdate,
    DATE_TRUNC('year', s.orderdate) AS order_year, -- Added
	DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AS start_date, -- Added
	DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day' AS end_date, -- Added
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE 
    s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
	s.orderdate
ORDER BY
	s.orderdate

Unnamed: 0,orderdate,order_year,start_date,end_date,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019-12-12,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,3257.530539,,15688.466388,43908.910922,884.061762,4332.187853,4855.617297,1979.706591
1,2019-12-13,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,2250.296782,,11870.262019,68753.401501,623.428539,5395.999219,2324.585780,8153.865000
2,2019-12-14,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,8937.165304,,17056.108888,84031.161917,1266.785466,1986.701324,4554.038666,3302.785800
3,2019-12-15,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,614.795000,,368.000000,10389.273840,,,529.172109,3497.639496
4,2019-12-16,2019-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,1968.352497,,29759.935162,54797.737402,548.050041,284.216441,1785.187638,2237.711868
...,...,...,...,...,...,...,...,...,...,...,...,...
1547,2024-04-16,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,86.280000,,4999.379603,14913.560008,26.004353,,2691.693404,
1548,2024-04-17,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,1204.229604,,12290.068340,11588.548911,598.368377,4466.726385,2391.827309,1880.060000
1549,2024-04-18,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,798.277272,,8566.395123,9839.511960,339.401768,3919.755236,2837.238691,1339.703843
1550,2024-04-19,2024-01-01 00:00:00-08:00,2019-01-01 00:00:00-08:00,2023-12-31 00:00:00-08:00,,,11369.710897,20964.289259,210.258031,6517.185497,2270.491344,2965.081900


5. Refine the `WHERE` clause to exclude partial years:

    - Replace `orderdate` with `order_year` in the `SELECT` clause.
    - Use the calculated `last_5_year` and `current_date_year` in the `WHERE` clause to filter complete years.
    - Group by `order_year` and order the results.

In [10]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    SUM(CASE WHEN p.categoryname = 'Audio' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS audio_sales,
	SUM(CASE WHEN p.categoryname = 'Cameras and Camcorders' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cameras_sales,
    SUM(CASE WHEN p.categoryname = 'Cell phones' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS cell_phones_sales,
    SUM(CASE WHEN p.categoryname = 'Computers' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS computers_sales,
    SUM(CASE WHEN p.categoryname = 'Games and Toys' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS games_toys_sales,
    SUM(CASE WHEN p.categoryname = 'Home Appliances' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS home_appliances_sales,
    SUM(CASE WHEN p.categoryname = 'Music, Movies and Audio Books' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS music_movies_books_sales,
    SUM(CASE WHEN p.categoryname = 'TV and Video' THEN (s.quantity * s.unitprice * s.exchangerate) END) AS tv_video_sales
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE -- Added
    s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY
	order_year
ORDER BY
	order_year

Unnamed: 0,order_year,audio_sales,cameras_sales,cell_phones_sales,computers_sales,games_toys_sales,home_appliances_sales,music_movies_books_sales,tv_video_sales
0,2019.0,987753.184369,,4749476.0,18532100.0,355839.873928,2246722.0,1250535.0,1721853.0
1,2020.0,392857.168851,,2003646.0,5434553.0,148341.501752,797483.6,722763.1,1066044.0
2,2021.0,417657.458186,,4119164.0,10514230.0,165070.837708,2224445.0,1313266.0,2393544.0
3,2022.0,814438.856577,,8623592.0,19003050.0,335683.162409,7026622.0,3178853.0,6187337.0
4,2023.0,730647.872482,,6383098.0,12373770.0,286481.695387,6317839.0,2321667.0,4699135.0


---
## AGE

### 📝 Notes

`AGE()`

- **AGE()** calculates the interval between two dates or timestamps.
- Returns a human-readable interval (e.g., `1 year 2 mons 3 days`) when passed two arguments or the difference from the current timestamp if given one.
- Example: `AGE(deliverydate, orderdate)` gives the processing time.

### 💻 Final Result

- Compute average processing times and total sales, aggregated by time periods.

#### Calculate Processing Time

**`AGE`**

1. Calculate the difference in time between the delivery date and order date using `AGE`:
    - Use `AGE(deliverydate, orderdate)` to compute the processing time for each order.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.

In [11]:
%%sql

SELECT 
    s.orderdate,
    AGE(s.deliverydate, s.orderdate) AS processing_time,
    s.quantity * s.unitprice * s.exchangerate AS total_sale
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
ORDER BY 
    s.orderdate;

Unnamed: 0,orderdate,processing_time,total_sale
0,2019-01-01,0 days,1697.602234
1,2019-01-01,0 days,259.248182
2,2019-01-01,0 days,476.854560
3,2019-01-01,0 days,728.382240
4,2019-01-01,0 days,8.991000
...,...,...,...
141333,2023-12-31,4 days,250.438230
141334,2023-12-31,0 days,23.000000
141335,2023-12-31,0 days,879.000000
141336,2023-12-31,0 days,268.000000


2. Extract the DAY from the difference between delivery date and order date:

    - Use `EXTRACT(DAY FROM AGE(deliverydate, orderdate))` to extract the day component.
    - Display the `orderdate` as Month-Year using `TO_CHAR(orderdate, 'MM-YYYY')`.

In [12]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate)) AS processing_time, -- Update
    s.quantity * s.unitprice * s.exchangerate AS total_sale
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
ORDER BY 
    order_month;

Unnamed: 0,order_month,processing_time,total_sale
0,01-2019,0,1198.000000
1,01-2019,3,156.276000
2,01-2019,3,167.700000
3,01-2019,4,3.847041
4,01-2019,4,127.393791
...,...,...,...
141333,12-2023,4,250.438230
141334,12-2023,0,23.000000
141335,12-2023,0,879.000000
141336,12-2023,0,268.000000


3. Aggregate data by month to get total sales and average processing time:

    - Calculate the average processing time using `AVG(EXTRACT(DAY FROM AGE(...)))`.
    - Compute the total sales using `SUM(quantity * unitprice * exchangerate)`.
    - Group by `TO_CHAR(orderdate, 'MM-YYYY')` and order the results.

In [13]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS avg_processing_time, -- Update
    SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    s.orderdate
ORDER BY 
    order_month;

Unnamed: 0,order_month,avg_processing_time,total_sales
0,01-2019,0.86764705882352941176,72623.938126
1,01-2019,0.81034482758620689655,153053.337490
2,01-2019,0.64102564102564102564,104417.625274
3,01-2019,0.85714285714285714286,56959.481981
4,01-2019,0.44202898550724637681,238537.987341
...,...,...,...
1780,12-2023,1.7090909090909091,152471.795880
1781,12-2023,2.2714285714285714,68271.137857
1782,12-2023,1.8984375000000000,92771.050637
1783,12-2023,1.1250000000000000,15674.227285


4. Reformat results:

    - Use `ROUND()` to format the average processing time and total sales to two decimal places.

In [14]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time, -- Update
    ROUND(CAST(SUM(s.quantity * s.unitprice * s.exchangerate) AS NUMERIC), 2) AS total_sales -- Update
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    order_month
ORDER BY 
    order_month;

Unnamed: 0,order_month,avg_processing_time,total_sales
0,01-2019,0.78,3288848.72
1,01-2020,1.01,2263233.02
2,01-2021,0.97,708348.24
3,01-2022,1.46,3878721.96
4,01-2023,1.69,3904637.1
5,02-2019,0.73,4129633.0
6,02-2020,0.8,2899529.73
7,02-2021,1.12,1167247.31
8,02-2022,1.53,5158757.76
9,02-2023,1.73,4746764.05


  5. Look at the yearly data.  
     - Replace monthly grouping with yearly grouping by changing `TO_CHAR(orderdate, 'MM-YYYY')` to `DATE_PART('year', orderdate)`.
     - Group data by `order_year` and order the results.

In [15]:
%%sql

SELECT 
    DATE_PART('year', s.orderdate) AS order_year, -- Update
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time,
    ROUND(CAST(SUM(s.quantity * s.unitprice * s.exchangerate) AS NUMERIC), 2) AS total_sales
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate BETWEEN DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '5 years' AND DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 day'
GROUP BY 
    order_year -- Update
ORDER BY 
    order_year; -- Update

Unnamed: 0,order_year,avg_processing_time,total_sales
0,2019.0,0.81,33852650.55
1,2020.0,0.93,11953128.81
2,2021.0,1.36,22692620.41
3,2022.0,1.62,47708807.69
4,2023.0,1.75,35220601.92
