<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/3_Date_Differences.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Calculations

## Overview

### 🥅 Analysis Goals

Continue time series analysis and analyzing the relationship between sales (net revenue) and delivery processing times:

- **Analyze processing times**: Calculate the time difference between order dates and delivery dates to evaluate operational efficiency.  
- **Aggregate and summarize sales**: Group sales data by time intervals (month, year) to identify trends and patterns in revenue and processing times.

### 📘 Concepts Covered

- `INTERVAL`
- `AGE()`

---

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## INTERVAL

### 📝 Notes

`INTERVAL`

- **INTERVAL** represents a span of time, such as days, months, hours, or seconds.
- Commonly used for date arithmetic (e.g., `CURRENT_DATE + INTERVAL '1 month'` adds one month to the current date).

- Syntax:
    ```sql
    CURRENT_DATE - INTERVAL 'value unit'
    ```

- Example:
    ```sql
    SELECT CURRENT_DATE - INTERVAL '5 years';

### 💻 Final Result

- Limit results to the last 5 years of sales, excluding the current year. Which makes the query's date filter dynamically update (instead of having to manually update it).

#### Filter Data by Time Intervals

**`INTERVAL`** and **`CURRENT_DATE`**

1. Use the last query to only return orders within the last 5 years of the current date.
    - Use `CURRENT_DATE` to dynamically reference the current date.
    - Subtract `INTERVAL '5 years'` from `CURRENT_DATE` to calculate the start date for filtering.
    - Add a `WHERE` clause to include only rows where `orderdate` is greater than or equal to the calculated start date.

In [6]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname, 
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
	s.orderdate >= CURRENT_DATE - INTERVAL '5 years' -- Added
GROUP BY
	order_year,
    p.categoryname
ORDER BY
	order_year,
    p.categoryname

Unnamed: 0,order_year,categoryname,net_revenue
0,2020.0,Audio,344725.7
1,2020.0,Cameras and camcorders,1253638.95
2,2020.0,Cell phones,1784693.27
3,2020.0,Computers,4846329.75
4,2020.0,Games and Toys,132404.4
5,2020.0,Home Appliances,707243.49
6,2020.0,"Music, Movies and Audio Books",640317.06
7,2020.0,TV and Video,906731.78
8,2021.0,Audio,393160.16
9,2021.0,Cameras and camcorders,1449672.87


---
## AGE

### 📝 Notes

`AGE()`
- `AGE` calculates the difference between two dates and returns the result as an interval.  

- Syntax:
    ```sql
    AGE(end_date, start_date)
    ```

- Example:
    ```sql
    SELECT AGE('2024-01-08', '2024-01-01');
    ```

`EXTRACT`
- `EXTRACT` retrieves a specific component (e.g., day, month, year) from a timestamp or interval.

- Syntax:
    ```sql
    EXTRACT(unit FROM source)
    ```

- Example:
    ```sql
    SELECT EXTRACT(DAY FROM AGE('2024-01-08', '2024-01-01'));
    ```

### 💻 Final Result

- Evaluate operational performance by calculating the average time taken between order and delivery dates.  
- Aggregate data by time intervals (month, year) to provide actionable insights into revenue and efficiency.

#### Calculate Processing Time

**`AGE`**

1. Calculate the difference in time between the delivery date and order date using `AGE`.
    - Use `AGE(deliverydate, orderdate)` to compute the processing time for each order.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.

In [15]:
%%sql

SELECT 
    s.orderdate,
    AGE(s.deliverydate, s.orderdate) AS processing_time,
    s.quantity * s.netprice * s.exchangerate AS net_revenue
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
ORDER BY 
    s.orderdate;

Unnamed: 0,orderdate,processing_time,net_revenue
0,2020-01-08,0 days,1978.88
1,2020-01-08,0 days,1352.66
2,2020-01-08,0 days,3.05
3,2020-01-08,4 days,614.46
4,2020-01-08,4 days,1288.00
...,...,...,...
123853,2024-04-20,1 days,914.61
123854,2024-04-20,1 days,150.18
123855,2024-04-20,2 days,147.78
123856,2024-04-20,2 days,2019.62


2. Extract the DAY from the difference between delivery date and order date.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.
    - Compute the total net revenue using `SUM(quantity * netprice * exchangerate)`.
    - 🔔 Use `EXTRACT(DAY FROM AGE(deliverydate, orderdate))` to extract the day component.
    - 🔔 Display the `orderdate` as Month-Year using `TO_CHAR(orderdate, 'MM-YYYY')`.

In [16]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month, -- Update
    EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate)) AS processing_time, -- Update
    s.quantity * s.netprice * s.exchangerate AS net_revenue
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
ORDER BY 
    order_month;

Unnamed: 0,order_month,processing_time,net_revenue
0,01-2020,0,15.98
1,01-2020,6,1487.21
2,01-2020,0,67.51
3,01-2020,0,2004.05
4,01-2020,0,2850.84
...,...,...,...
123853,12-2023,2,168.29
123854,12-2023,2,1289.57
123855,12-2023,2,10395.00
123856,12-2023,3,280.12


3. Aggregate data by month to get total sales and average processing time.
    - Compute the total net revenue using `SUM(quantity * netprice * exchangerate)`.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.
    - 🔔 Calculate the average processing time using `AVG(EXTRACT(DAY FROM AGE(...)))`.
    - Group by `TO_CHAR(orderdate, 'MM-YYYY')` and order the results.

In [17]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS avg_processing_time, -- Update
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY 
    s.orderdate
ORDER BY 
    order_month;

Unnamed: 0,order_month,avg_processing_time,net_revenue
0,01-2020,0.66037735849056603774,47053.46
1,01-2020,0.11111111111111111111,17744.61
2,01-2020,1.1071428571428571,106893.21
3,01-2020,1.2153846153846154,79693.66
4,01-2020,2.1250000000000000,13395.79
...,...,...,...
1520,12-2023,1.5906432748538012,143450.82
1521,12-2023,1.8151260504201681,102086.59
1522,12-2023,1.00000000000000000000,5594.66
1523,12-2023,1.6482213438735178,225511.77


4. Reformat the `avg_procesing_time` and `net_revenue` to make it easier to read.
    - Compute the total net revenue using `SUM(quantity * netprice * exchangerate)`.
    - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
    - Return the order date, processing time, and total sale amount for each transaction.
    - 🔔 Use `ROUND()` to format the average processing time and total sales to two decimal places.
    - `ROUND`: Rounds numeric values to a specified number of decimal places for better readability.
        - Syntax:
            ```sql
            ROUND(value, precision)
            ```
        - Example:
            ```sql
            SELECT ROUND(1234.56789, 2);
            ```

In [18]:
%%sql

SELECT 
    TO_CHAR(s.orderdate, 'MM-YYYY') AS order_month,
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time, -- Update
    ROUND(CAST(SUM(s.quantity * s.netprice * s.exchangerate) AS NUMERIC), 2) AS net_revenue -- Update
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY 
    order_month
ORDER BY 
    order_month;

Unnamed: 0,order_month,avg_processing_time,net_revenue
0,01-2020,1.0,1529781.54
1,01-2021,0.97,669787.93
2,01-2022,1.46,3647525.92
3,01-2023,1.69,3664431.34
4,01-2024,1.75,2677498.55
5,02-2020,0.8,2713593.19
6,02-2021,1.12,1094980.88
7,02-2022,1.53,4840124.87
8,02-2023,1.73,4465204.57
9,02-2024,1.64,3542322.55


  5. Evaluate the yearly data.  
     - Use `ROUND()` to format the average processing time and total sales to two decimal places.
     - Compute the total net revenue using `SUM(quantity * netprice * exchangerate)`.
     - Exclude rows with `NULL` delivery dates in the `WHERE` clause.
     - Return the order date, processing time, and total sale amount for each transaction.
     - 🔔 Replace monthly grouping with yearly grouping by changing `TO_CHAR(orderdate, 'MM-YYYY')` to `DATE_PART('year', orderdate)`.
     - 🔔 Group data by `order_year` and order the results.


In [19]:
%%sql

SELECT 
    DATE_PART('year', s.orderdate) AS order_year, -- Update
    ROUND(CAST(AVG(EXTRACT(DAY FROM AGE(s.deliverydate, s.orderdate))) AS NUMERIC), 2) AS avg_processing_time,
    ROUND(CAST(SUM(s.quantity * s.netprice * s.exchangerate) AS NUMERIC), 2) AS net_revenue
FROM 
    sales s
LEFT JOIN 
    product p ON s.productkey = p.productkey
WHERE 
    s.deliverydate IS NOT NULL
    AND s.orderdate >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY 
    order_year -- Update
ORDER BY 
    order_year; -- Update

Unnamed: 0,order_year,avg_processing_time,net_revenue
0,2020.0,0.92,10616084.4
1,2021.0,1.36,21357976.66
2,2022.0,1.62,44864557.21
3,2023.0,1.75,33108565.51
4,2024.0,1.67,8396527.38
