<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/2_Date_Components.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Components

## Overview

### 🥅 Analysis Goals

The analysis will focus on understanding sales (net revenue) trends and patterns using date components:

- **Aggregate sales by specific date components**: Extract and group data by year, month, and day using `DATE_PART` for detailed time-based analyses.  
- **Filter data based on the current date**: Use `CURRENT_DATE` to dynamically filter results for reports.

### 📘 Concepts Covered

- `DATE_PART()`
- `CURRENT_DATE()`

---

In [3]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


### 💡 Note

**❗️❗️Note for Luke❗️❗️: We may delete this note if we delete the date dimension table**

You may notice this specific database actually has a **date dimensions** table which is a static table that has one row per day, with other date attributes like day of the week, month name, etc. So you could join a table to this table to get the month or year. 

We **won't** be using this because not every database you'll work with has this. Also, it's important to understand how to calculate dates for different types of analysis (as you'll see). 

---
## DATE_PART

### 📝 Notes

`DATE_PART`
- `DATE_PART()` extracts specific components (e.g., year, month, day) from a date or timestamp.
- Syntax: 
  ```sql 
  DATE_PART('unit', source) -- unit can be 'year', 'month', 'day', etc.
  ```
- Example: 
  ```sql
  DATE_PART('year', orderdate)` -- extracts the year from the orderdate
  ```

### 💻 Final Result

- Group and summarize net revenue by year, month, and day using `DATE_PART`. 

#### Extract Date Components and Aggregate Net Revenue

**`DATE_PART`**

1. Use `DATE_PART` to get year, month, and day of the net_revenue and also return the total net revenue amount.
    - Extract `year`, `month`, and `day` from `orderdate` using `DATE_PART`.
    - Calculate the total net revenue by multiplying `quantity` by `netprice` and `exchangerate`.
    - Aggregate net revenue by the extracted components using `SUM()`.
    - Group by `year`, `month`, and `day` for detailed insights.
    - Sort the results by `year`, `month`, and `day` for chronological order.

In [4]:
%%sql

SELECT
    DATE_PART('year', s.orderdate) AS order_year,
    DATE_PART('month', s.orderdate) AS order_month,
    DATE_PART('day', s.orderdate) AS order_day,
    SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM
    sales s
    LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
    order_year, order_month, order_day
ORDER BY
    order_year, order_month, order_day;

Unnamed: 0,order_year,order_month,order_day,net_revenue
0,2015.00,1.00,1.00,11640.80
1,2015.00,1.00,2.00,5890.40
2,2015.00,1.00,3.00,19796.67
3,2015.00,1.00,5.00,12406.27
4,2015.00,1.00,6.00,10349.87
...,...,...,...,...
3289,2024.00,4.00,16.00,25098.99
3290,2024.00,4.00,17.00,32938.67
3291,2024.00,4.00,18.00,28408.76
3292,2024.00,4.00,19.00,48386.88


2. Summarize net revenue by year.
    - Extract the `year` component from `orderdate` using `DATE_PART('year', orderdate)`.
    - Calculate total net revenue for each year using `SUM(quantity * netprice * exchangerate)`.
    - Group data by `year` and order the results chronologically.

In [6]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue -- Added
FROM sales s
GROUP BY -- Added
	order_year
ORDER BY -- Added
	order_year

Unnamed: 0,order_year,net_revenue
0,2015.0,7370979.48
1,2016.0,10383613.67
2,2017.0,13221339.05
3,2018.0,24667447.84
4,2019.0,31818095.97
5,2020.0,11218435.79
6,2021.0,21357976.66
7,2022.0,44864557.21
8,2023.0,33108565.51
9,2024.0,8396527.38


**📊[Insert chart]📊**

3. Add category-level granularity to yearly summaries.
    - Include `categoryname` in the query to break down yearly sales by product categories.
    - Group data by `order_year` and `categoryname`.
    - Aggregate net revenue within each year and category.
    - Group the data by these two columns and order by both.

In [7]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname, -- Added
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year,
    p.categoryname -- Added
ORDER BY
	order_year,
    p.categoryname -- Added

Unnamed: 0,order_year,categoryname,net_revenue
0,2015.00,Audio,170872.15
1,2015.00,Cameras and camcorders,1828111.71
2,2015.00,Cell phones,591513.47
3,2015.00,Computers,2139915.71
4,2015.00,Games and Toys,45404.59
...,...,...,...
75,2024.00,Computers,2957039.62
76,2024.00,Games and Toys,85867.75
77,2024.00,Home Appliances,1320161.48
78,2024.00,"Music, Movies and Audio Books",592662.15


**📊[Insert chart]📊**

---
## CURRENT_DATE

### 📝 Notes

`CURRENT_DATE`

- Retrieves the current date based on the system's time zone.

- Syntax: 
    ```sql
    CURRENT_DATE
    ```
- Example: 
    ```sql
    SELECT CURRENT_DATE;
    ```
- **Note:** Similar to `CURRENT_DATE` there's also `NOW` which gets the current date *and* time. 

### 💻 Final Result

- Use `CURRENT_DATE` to limit results to dates before the current year, enabling dynamic reporting.

#### Filter Data Based on Current Date

**`CURRENT_DATE`**

1. Use `CURRENT_DATE` to filter data dynamically.
    - Add `CURRENT_DATE` as a column to display the current date for context.
    - Modify the query to only include rows where `order_year` is the same as the `CURRENT_DATE`.
    - Group data by `order_year` and `categoryname` and order results chronologically.
    - **Note**: Our data only goes until 2024 so this won't return data, but if the data was up-to-date (like you'll often be working with) this will work.

In [None]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    CURRENT_DATE AS current_day,-- Added
    p.categoryname, 
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
	s.orderdate = CURRENT_DATE -- Added
GROUP BY
	order_year,
    p.categoryname
ORDER BY
	order_year,
    p.categoryname

2. Filter data without displaying `CURRENT_DATE`.
    - Remove the `CURRENT_DATE` column from the query.
    - Modify the query to only include rows where `order_year` is less than the current year by comparing `DATE_PART('year', orderdate)` with `DATE_PART('year', CURRENT_DATE)`.
    - Group data by `order_year` and `categoryname` and order results chronologically.
    - **Note**: Our data only goes until 2024 so this isn't useful, but if the current date was in 2023 it would only return data up until that. We'll be going into a more relevant example in the next section.

In [15]:
%%sql

SELECT 
	DATE_PART('year', s.orderdate) AS order_year,
    p.categoryname, 
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
WHERE
	DATE_PART('year', s.orderdate) < DATE_PART('year', CURRENT_DATE) -- Added
GROUP BY
	order_year,
    p.categoryname
ORDER BY
	order_year,
    p.categoryname

Unnamed: 0,order_year,current_day,categoryname,net_revenue
0,2015.00,2025-01-08,Audio,170872.15
1,2015.00,2025-01-08,Cameras and camcorders,1828111.71
2,2015.00,2025-01-08,Cell phones,591513.47
3,2015.00,2025-01-08,Computers,2139915.71
4,2015.00,2025-01-08,Games and Toys,45404.59
...,...,...,...,...
75,2024.00,2025-01-08,Computers,2957039.62
76,2024.00,2025-01-08,Games and Toys,85867.75
77,2024.00,2025-01-08,Home Appliances,1320161.48
78,2024.00,2025-01-08,"Music, Movies and Audio Books",592662.15
