<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/2_Date_Components.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Components

## Overview

### 🥅 Analysis Goals

Continue time series analysis investigating the sales (net revenue) trends and patterns using date components:

- **Aggregate sales by specific date components**: Extract and group data by year, month, and day using `DATE_PART` for detailed time-based analyses.  
- **Filter data based on the current date**: Use `CURRENT_DATE` to dynamically filter results for reports.

### 📘 Concepts Covered

- `DATE_PART()` & `EXTRACT()`
- `CURRENT_DATE()` & `NOW()`

[Source Documentation on Date/Time Functions.](https://www.postgresql.org/docs/current/functions-datetime.html)

---

In [32]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---
## DATE_PART & EXTRACT

### 📝 Notes

#### `DATE_PART`
- `DATE_PART()` extracts specific components (e.g., year, month, day) from a date or timestamp.
- Syntax: 
  ```sql 
  DATE_PART('unit', source) -- unit can be 'year', 'month', 'day', etc.
  ```
- Common units:
  - `year`
  - `month`
  - `day`
  - `hour`
  - `minute`
  - `second`


In [33]:
%%sql

SELECT
    orderdate,
    DATE_PART('year', orderdate) AS order_year,
    DATE_PART('month', orderdate) AS order_month,
    DATE_PART('day', orderdate) AS order_day
FROM
    sales
ORDER BY RANDOM()
LIMIT 10

Unnamed: 0,orderdate,order_year,order_month,order_day
0,2020-11-09,2020.0,11.0,9.0
1,2022-03-04,2022.0,3.0,4.0
2,2023-01-16,2023.0,1.0,16.0
3,2016-02-15,2016.0,2.0,15.0
4,2020-08-17,2020.0,8.0,17.0
5,2023-01-23,2023.0,1.0,23.0
6,2016-02-23,2016.0,2.0,23.0
7,2022-08-11,2022.0,8.0,11.0
8,2017-12-29,2017.0,12.0,29.0
9,2017-01-10,2017.0,1.0,10.0


#### `EXTRACT`

- `EXTRACT()` is a more verbose way to extract specific components from a date or timestamp.
- Syntax:
  ```sql
  EXTRACT(unit FROM source) -- unit can be 'year', 'month', 'day', etc.
  ```
- Common units:
  - `YEAR`
  - `MONTH`
  - `DAY`
  - `HOUR`
  - `MINUTE`
  - `SECOND`

In [34]:
%%sql

SELECT
    orderdate,
    EXTRACT(YEAR FROM orderdate) AS extract_year,
    EXTRACT(MONTH FROM orderdate) AS extract_month, 
    EXTRACT(DAY FROM orderdate) AS extract_day
FROM
    sales
ORDER BY RANDOM()
LIMIT 10

Unnamed: 0,orderdate,extract_year,extract_month,extract_day
0,2021-07-22,2021,7,22
1,2019-11-28,2019,11,28
2,2022-11-21,2022,11,21
3,2023-02-18,2023,2,18
4,2015-05-22,2015,5,22
5,2024-02-16,2024,2,16
6,2020-01-13,2020,1,13
7,2017-11-08,2017,11,8
8,2022-05-12,2022,5,12
9,2021-11-10,2021,11,10


#### Difference between `DATE_PART` and `EXTRACT`

| Feature               | Definition                               | Syntax                                | Example Query                                   | Example Output       |
|-----------------------|------------------------------------------|---------------------------------------|------------------------------------------------|----------------------|
| **`DATE_PART`**         | Retrieves part of a date as a string input. | `DATE_PART('field', source)`          | `DATE_PART('month', TIMESTAMP '2025-01-10')`   | `1.0` (double precision) |
| **`EXTRACT`**           | Retrieves part of a date using a keyword. | `EXTRACT(field FROM source)`          | `EXTRACT(MONTH FROM TIMESTAMP '2025-01-10')`   | `1` (integer)         |

### 📈 Analysis

- Group and summarize net revenue by year, month, and day using `DATE_PART` for detailed time-based analyses. 

#### Extract Date Components and Aggregate Net Revenue

**`DATE_PART`**

1. Use `DATE_PART` to get year, month, and day of the net_revenue and also return the total net revenue amount.
    - Extract `year`, `month`, and `day` from `orderdate` using `DATE_PART`.
    - Calculate the total net revenue by multiplying `quantity` by `netprice` and `exchangerate`.
    - Aggregate net revenue by the extracted components using `SUM()`.
    - Group by `year`, `month`, and `day` for detailed insights.
    - Sort the results by `year`, `month`, and `day` for chronological order.

In [35]:
%%sql

SELECT
    DATE_PART('year', s.orderdate) AS order_year,
    DATE_PART('month', s.orderdate) AS order_month,
    DATE_PART('day', s.orderdate) AS order_day,
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    order_year, order_month, order_day
ORDER BY
    order_year, order_month, order_day;

Unnamed: 0,order_year,order_month,order_day,net_revenue
0,2015.00,1.00,1.00,11640.80
1,2015.00,1.00,2.00,5890.40
2,2015.00,1.00,3.00,19796.67
3,2015.00,1.00,5.00,12406.27
4,2015.00,1.00,6.00,10349.87
...,...,...,...,...
3289,2024.00,4.00,16.00,25098.99
3290,2024.00,4.00,17.00,32938.67
3291,2024.00,4.00,18.00,28408.76
3292,2024.00,4.00,19.00,48386.88


2. Summarize net revenue by year.
    - 🔔 Extract the `year` component from `orderdate` using `DATE_PART('year', orderdate)`.
    - Calculate total net revenue for each year using `SUM(quantity * netprice * exchangerate)`.
    - 🔔 Group data by `year` and order the results chronologically.

In [36]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
GROUP BY -- Added
	order_year
ORDER BY -- Added
	order_year

Unnamed: 0,order_year,net_revenue
0,2015.0,7370979.48
1,2016.0,10383613.67
2,2017.0,13221339.05
3,2018.0,24667447.84
4,2019.0,31818095.97
5,2020.0,11218435.79
6,2021.0,21357976.66
7,2022.0,44864557.21
8,2023.0,33108565.51
9,2024.0,8396527.38


3. Add category-level granularity to yearly summaries.
    - 🔔 Include `categoryname` in the query to break down yearly sales by product categories.
    - Group data by `order_year` and `categoryname`.
    - Aggregate net revenue within each year and category.
    - 🔔 Group the data by these two columns and order by both.

In [37]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname -- Added
ORDER BY
	order_year,
    p.categoryname -- Added

Unnamed: 0,order_year,categoryname,net_revenue
0,2015.00,Audio,170872.15
1,2015.00,Cameras and camcorders,1828111.71
2,2015.00,Cell phones,591513.47
3,2015.00,Computers,2139915.71
4,2015.00,Games and Toys,45404.59
...,...,...,...
75,2024.00,Computers,2957039.62
76,2024.00,Games and Toys,85867.75
77,2024.00,Home Appliances,1320161.48
78,2024.00,"Music, Movies and Audio Books",592662.15


<img src="../Resources/images/2.2_year_rev_category.png" alt="Category" width="50%">

---
## CURRENT_DATE & NOW

### 📝 Notes

#### `CURRENT_DATE`

- Retrieves the current date based on the system's time zone.

- Syntax: 
    ```sql
    CURRENT_DATE
    ```

In [38]:
%%sql

SELECT CURRENT_DATE

Unnamed: 0,current_date
0,2025-02-13


#### `NOW`

- Similar to `CURRENT_DATE` there's also `NOW` which gets the current date *and* time (e.g. `2019-12-23 14:39:53.662522-05`).
- Syntax:  

    ```sql
    NOW()
    ```

In [39]:
%%sql

SELECT NOW()

Unnamed: 0,now
0,2025-02-13 18:53:45.582572-06:00



### 📈 Analysis

- Use `CURRENT_DATE` to limit results to only those that occurred in the last 5 years.

#### Filter Data Based on Current Date

**`CURRENT_DATE`**

1. Investigate the data prior to using `CURRENT_DATE`, for this
    - Identify daily order net revenue by category.
    - Group by `orderdate` and `categoryname` and order results chronologically.
    - Include `CURRENT_DATE` in columns selected.

In [40]:
%%sql

SELECT 
	CURRENT_DATE,
	orderdate,
    p.categoryname, 
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	orderdate,
    p.categoryname
ORDER BY
	orderdate,
    p.categoryname

Unnamed: 0,current_date,orderdate,categoryname,net_revenue
0,2025-02-13,2015-01-01,Audio,1555.67
1,2025-02-13,2015-01-01,Cameras and camcorders,4977.13
2,2025-02-13,2015-01-01,Computers,3066.35
3,2025-02-13,2015-01-01,Games and Toys,163.87
4,2025-02-13,2015-01-01,Home Appliances,1152.57
...,...,...,...,...
23491,2025-02-13,2024-04-20,Computers,58353.68
23492,2025-02-13,2024-04-20,Games and Toys,1744.30
23493,2025-02-13,2024-04-20,Home Appliances,1562.04
23494,2025-02-13,2024-04-20,"Music, Movies and Audio Books",4949.43


2. Filter data for the last 5 years.
    - Use `CURRENT_DATE` to filter the data to only include orders from the last 5 years.

In [41]:
%%sql

SELECT 
	CURRENT_DATE,
	orderdate,
    p.categoryname, 
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
	EXTRACT(YEAR FROM orderdate) >= EXTRACT(YEAR FROM CURRENT_DATE) - 5  -- last 5 years
GROUP BY
	orderdate,
    p.categoryname
ORDER BY
	orderdate,
    p.categoryname

Unnamed: 0,current_date,orderdate,categoryname,net_revenue
0,2025-02-13,2020-01-01,Audio,5490.14
1,2025-02-13,2020-01-01,Cameras and camcorders,18880.06
2,2025-02-13,2020-01-01,Cell phones,22593.00
3,2025-02-13,2020-01-01,Computers,78554.54
4,2025-02-13,2020-01-01,Games and Toys,1476.43
...,...,...,...,...
11166,2025-02-13,2024-04-20,Computers,58353.68
11167,2025-02-13,2024-04-20,Games and Toys,1744.30
11168,2025-02-13,2024-04-20,Home Appliances,1562.04
11169,2025-02-13,2024-04-20,"Music, Movies and Audio Books",4949.43
