<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Time/2_Date_Filtering.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date & Time Filtering

## Overview

### 🥅 Analysis Goals

- **Aggregate sales by specific date components**: Extract and group data by year, month, and day using `DATE_PART` for detailed time-based analyses.  
- **Filter data based on the current date**: Use `CURRENT_DATE` to dynamically filter results for reports.

### 📘 Concepts Covered

- `DATE_PART()` & `EXTRACT()`
- `CURRENT_DATE()` & `NOW()`

[Source Documentation on Date/Time Functions.](https://www.postgresql.org/docs/current/functions-datetime.html)

---

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Update package installer
    !sudo apt-get update -qq > /dev/null 2>&1

    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## DATE_PART & EXTRACT

### 📝 Notes

#### `DATE_PART`
- `DATE_PART()` extracts specific components (e.g., year, month, day) from a date or timestamp.
- Syntax:

  ```sql
  DATE_PART('unit', source)
  ```
- Common units:
  - `year`
  - `month`
  - `day`
  - `hour`
  - `minute`
  - `second`


In [None]:
%%sql

SELECT
    orderdate,
    DATE_PART('year', orderdate) AS order_year,
    DATE_PART('month', orderdate) AS order_month,
    DATE_PART('day', orderdate) AS order_day
FROM
    sales
ORDER BY RANDOM()
LIMIT 10

Unnamed: 0,orderdate,order_year,order_month,order_day
0,2020-11-09,2020.0,11.0,9.0
1,2022-03-04,2022.0,3.0,4.0
2,2023-01-16,2023.0,1.0,16.0
3,2016-02-15,2016.0,2.0,15.0
4,2020-08-17,2020.0,8.0,17.0
5,2023-01-23,2023.0,1.0,23.0
6,2016-02-23,2016.0,2.0,23.0
7,2022-08-11,2022.0,8.0,11.0
8,2017-12-29,2017.0,12.0,29.0
9,2017-01-10,2017.0,1.0,10.0


#### `EXTRACT`

- `EXTRACT()` is a more verbose way to extract specific components from a date or timestamp.
- Syntax:

  ```sql
  EXTRACT(unit FROM source)
  ```
- Common units:
  - `YEAR`
  - `MONTH`
  - `DAY`
  - `HOUR`
  - `MINUTE`
  - `SECOND`

In [None]:
%%sql

SELECT
    orderdate,
    EXTRACT(YEAR FROM orderdate) AS extract_year,
    EXTRACT(MONTH FROM orderdate) AS extract_month,
    EXTRACT(DAY FROM orderdate) AS extract_day
FROM
    sales
ORDER BY RANDOM()
LIMIT 10

Unnamed: 0,orderdate,extract_year,extract_month,extract_day
0,2021-07-22,2021,7,22
1,2019-11-28,2019,11,28
2,2022-11-21,2022,11,21
3,2023-02-18,2023,2,18
4,2015-05-22,2015,5,22
5,2024-02-16,2024,2,16
6,2020-01-13,2020,1,13
7,2017-11-08,2017,11,8
8,2022-05-12,2022,5,12
9,2021-11-10,2021,11,10


#### Difference between `DATE_PART` and `EXTRACT`

| Feature               | Definition                               | Syntax                                | Example Query                                   | Example Output       |
|-----------------------|------------------------------------------|---------------------------------------|------------------------------------------------|----------------------|
| **`DATE_PART`**         | Retrieves part of a date as a string input. | `DATE_PART('field', source)`          | `DATE_PART('month', TIMESTAMP '2025-01-10')`   | `1.0` (double precision) |
| **`EXTRACT`**           | Retrieves part of a date using a keyword. | `EXTRACT(field FROM source)`          | `EXTRACT(MONTH FROM TIMESTAMP '2025-01-10')`   | `1` (integer)         |

### 📈 Analysis

- Group and summarize net revenue by year and month using `DATE_PART` for detailed time-based analyses.

#### Extract Date Components and Aggregate Net Revenue

**`EXTRACT`**

1. Use `EXTRACT` to get year and month of the net_revenue and also return the total net revenue amount.
    - Extract `year` and `month` from `orderdate` using `EXTRACT`.
    - Calculate the total net revenue by multiplying `quantity` by `netprice` and `exchangerate`.
    - Aggregate net revenue by the extracted components using `SUM()`.
    - Group by ``year` and `month` for detailed insights.
    - Sort the results by `year` and `month` for chronological order.

In [None]:
%%sql

SELECT
    EXTRACT(YEAR FROM s.orderdate) AS order_year,
    EXTRACT(MONTH FROM s.orderdate) AS order_month,
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    order_year,
    order_month
ORDER BY
    order_year, order_month;

Unnamed: 0,order_year,order_month,net_revenue
0,2015,1,384092.66
1,2015,2,706374.12
2,2015,3,332961.59
3,2015,4,160767.00
4,2015,5,548632.63
...,...,...,...
107,2023,12,2928550.93
108,2024,1,2677498.55
109,2024,2,3542322.55
110,2024,3,1692854.89


2. Add category-level granularity to yearly summaries.
    - 🔔 Include `categoryname` in the query to break down yearly sales by product categories.
    - Group data by `order_year` and `categoryname`.
    - Aggregate net revenue within each year and category.
    - 🔔 Group the data by these two columns and order by both.

In [None]:
%%sql

SELECT
	 EXTRACT(YEAR FROM s.orderdate) AS order_year,
    p.categoryname, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname -- Added
ORDER BY
	order_year,
    p.categoryname -- Added

Unnamed: 0,order_year,categoryname,net_revenue
0,2015,Audio,170872.15
1,2015,Cameras and camcorders,1828111.71
2,2015,Cell phones,591513.47
3,2015,Computers,2139915.71
4,2015,Games and Toys,45404.59
...,...,...,...
75,2024,Computers,2957039.62
76,2024,Games and Toys,85867.75
77,2024,Home Appliances,1320161.48
78,2024,"Music, Movies and Audio Books",592662.15


<img src="https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/Resources/images/2.2_year_rev_category.png?raw=1" alt="Category" width="50%">

---
## CURRENT_DATE & NOW

### 📝 Notes

#### `CURRENT_DATE`

- Retrieves the current date based on the system's time zone.

- Syntax:
    ```sql
    CURRENT_DATE
    ```

In [None]:
%%sql

SELECT CURRENT_DATE

Unnamed: 0,current_date
0,2025-02-13


#### `NOW`

- Similar to `CURRENT_DATE` there's also `NOW` which gets the current date *and* time (e.g. `2019-12-23 14:39:53.662522-05`).
- Syntax:  

    ```sql
    NOW()
    ```

In [None]:
%%sql

SELECT NOW()

Unnamed: 0,now
0,2025-02-13 18:53:45.582572-06:00



### 📈 Analysis

- Use `CURRENT_DATE` to limit results to only those that occurred in the last 5 years.

#### Filter Data Based on Current Date

**`CURRENT_DATE`**

1. Investigate the data prior to using `CURRENT_DATE`, for this
    - Identify daily order net revenue by category.
    - Group by `orderdate` and `categoryname` and order results chronologically.
    - Include `CURRENT_DATE` in columns selected.

In [None]:
%%sql

SELECT
	CURRENT_DATE,
	s.orderdate,
    p.categoryname,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	s.orderdate,
    p.categoryname
ORDER BY
	s.orderdate,
    p.categoryname

Unnamed: 0,current_date,orderdate,categoryname,net_revenue
0,2025-02-13,2015-01-01,Audio,1555.67
1,2025-02-13,2015-01-01,Cameras and camcorders,4977.13
2,2025-02-13,2015-01-01,Computers,3066.35
3,2025-02-13,2015-01-01,Games and Toys,163.87
4,2025-02-13,2015-01-01,Home Appliances,1152.57
...,...,...,...,...
23491,2025-02-13,2024-04-20,Computers,58353.68
23492,2025-02-13,2024-04-20,Games and Toys,1744.30
23493,2025-02-13,2024-04-20,Home Appliances,1562.04
23494,2025-02-13,2024-04-20,"Music, Movies and Audio Books",4949.43


2. Filter data for the last 5 years.
    - Use `CURRENT_DATE` to filter the data to only include orders from the last 5 years.

In [None]:
%%sql

SELECT
	CURRENT_DATE,
	s.orderdate,
	p.categoryname,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
	EXTRACT(YEAR FROM s.orderdate) >= EXTRACT(YEAR FROM CURRENT_DATE) - 5  -- last 5 years
GROUP BY
	s.orderdate,
	p.categoryname
ORDER BY
	s.orderdate,
	p.categoryname

Unnamed: 0,current_date,orderdate,categoryname,net_revenue
0,2025-02-13,2020-01-01,Audio,5490.14
1,2025-02-13,2020-01-01,Cameras and camcorders,18880.06
2,2025-02-13,2020-01-01,Cell phones,22593.00
3,2025-02-13,2020-01-01,Computers,78554.54
4,2025-02-13,2020-01-01,Games and Toys,1476.43
...,...,...,...,...
11166,2025-02-13,2024-04-20,Computers,58353.68
11167,2025-02-13,2024-04-20,Games and Toys,1744.30
11168,2025-02-13,2024-04-20,Home Appliances,1562.04
11169,2025-02-13,2024-04-20,"Music, Movies and Audio Books",4949.43


In [6]:
##Order Date Parts (2.2.1) - Problem
##For each order, identify the following time components: decade, quarter, month, year, and ISO year. This will help analyze the distribution of orders over time.

%%sql

SELECT
  orderkey,
  EXTRACT (decade from orderdate) as decade,
  EXTRACT (quarter from orderdate) as quarter,
  EXTRACT (month from orderdate) as month,
  EXTRACT (year from orderdate) as year,
  EXTRACT (week from orderdate) as ISO_year
FROM sales

Unnamed: 0,orderkey,decade,quarter,month,year,iso_year
0,1000,201,1,1,2015,1
1,1000,201,1,1,2015,1
2,1001,201,1,1,2015,1
3,1002,201,1,1,2015,1
4,1002,201,1,1,2015,1
...,...,...,...,...,...,...
199868,3398034,202,2,4,2024,16
199869,3398034,202,2,4,2024,16
199870,3398035,202,2,4,2024,16
199871,3398035,202,2,4,2024,16


In [11]:
##Daily Revenue 2022 (2.2.2) - Problem
##Calculate the total net revenue for each day of the year in 2022. This will help in understanding the daily revenue trends for that year.

%%sql


SELECT
  EXTRACT (DOY from orderdate) as daily_sales,
  SUM(netprice * quantity * exchangerate)
FROM
  sales
WHERE EXTRACT (year from orderdate) = 2022
GROUP BY daily_sales

Unnamed: 0,daily_sales,sum
0,1,255185.54
1,2,30229.29
2,3,141615.78
3,4,129968.60
4,5,171813.44
...,...,...
360,361,113441.22
361,362,198531.19
362,363,202345.75
363,364,184191.39


In [22]:
##Orders by Day of Week (2.2.3) - Problem
##Analyze the total orders by day of the week for only the year five years ago. This will help in understanding which days had the most orders.

    #Use EXTRACT to get the day of the week from orderdate.
    #Use EXTRACT also to filter the data to only include orders from five years ago.
    #Group the results by the day of the week and count the total number of orders for each day.

%%sql
SELECT
  EXTRACT (DOW from orderdate) as day_of_week,
  COUNT(orderkey) as no_of_orders
FROM sales
WHERE EXTRACT (YEAR from orderdate) = EXTRACT (YEAR from CURRENT_DATE) - 5
GROUP BY day_of_week

Unnamed: 0,day_of_week,no_of_orders
0,0,162
1,1,1154
2,2,1458
3,3,2080
4,4,2173
5,5,1535
6,6,2705


In [32]:
##Order Trends Analysis (2.2.4) - Problem
##Problem Statement

#Analyze the distribution of orders over the last 6 years by extracting specific date components from the orderdate in the sales table. This will help in understanding the order trends over time.

    #Extract the year and quarter from the orderdate in separate columns.
    #Count the total number of orders and the number of unique customers for each combination of year and quarter.
    #Filter the data to include only orders from the last 6 years from the current date using DATE_PART() and NOW().
    #Group the results by year and quarter, and order them chronologically.

%%sql

SELECT
  EXTRACT (YEAR from orderdate) as order_year,
  EXTRACT (QUARTER from orderdate) as order_quarter,
  COUNT (orderkey) as orders,
  COUNT (DISTINCT customerkey) as unique_customers

FROM sales

WHERE EXTRACT (YEAR from orderdate) >= EXTRACT (YEAR from CURRENT_DATE) - 6
GROUP BY order_year, order_quarter
ORDER BY order_year, order_quarter


Unnamed: 0,order_year,order_quarter,orders,unique_customers
0,2019,1,7690,3113
1,2019,2,5921,2356
2,2019,3,6364,2627
3,2019,4,7043,2807
4,2020,1,6054,2542
5,2020,2,2434,1033
6,2020,3,1471,612
7,2020,4,1308,566
8,2021,1,2173,906
9,2021,2,3559,1466
