<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/1_Date_Format.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Format

## Overview

### 🥅 Analysis Goals

Summarize sales revenue for a business to understand trends by month.

- Summarize sales revenue by month using precise date truncation.
- Create a human-readable version of the monthly sales summary for reports.

**📊[Insert chart]📊**

### 📘 Concepts Covered

Date formatting:
- `DATE_TRUNC()`
- `TO_CHAR()`

---

In [10]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---
## DATE_TRUNC

### 📝 Notes

`DATE_TRUNC`

- **DATE_TRUNC** truncates a timestamp to a specified level of precision (e.g., year, month, day, hour).

- Syntax: 

  ```
  DATE_TRUNC('precision', timestamp)
  ```

  - Example: `DATE_TRUNC('month', '2024-12-04 10:15:30')` returns `2024-12-01 00:00:00`.
### 💻 Final Result

- Return total sales revenue aggregated by month, with a precise timestamp.

#### Truncate Date

**`DATE_TRUNC`**

1. Use `DATE_TRUNC` to return the total sales by month.
    - Truncate `orderdate` to the first day of each month using `DATE_TRUNC`.
    - Multiply `quantity` by `price` and `exchangerate` to calculate the total revenue for each sale.
    - Aggregate sales by month using `SUM()`.
    - Use `GROUP BY` on the truncated month to perform the aggregation.
    - Sort the result by month for chronological order.

In [11]:
%%sql

SELECT 
	DATE_TRUNC('month', s.orderdate) AS order_month,
	SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,total_sale_amount
0,2015-01-01 00:00:00-08:00,4.108292e+05
1,2015-02-01 00:00:00-08:00,7.558171e+05
2,2015-03-01 00:00:00-08:00,3.526008e+05
3,2015-04-01 00:00:00-07:00,1.712490e+05
4,2015-05-01 00:00:00-07:00,5.834638e+05
...,...,...
107,2023-12-01 00:00:00-08:00,3.113504e+06
108,2024-01-01 00:00:00-08:00,2.851227e+06
109,2024-02-01 00:00:00-08:00,3.759333e+06
110,2024-03-01 00:00:00-08:00,1.801386e+06


2. Use `DATE_TRUNC` to return the total unique customers by month.

In [12]:
%%sql

SELECT 
	DATE_TRUNC('month', s.orderdate) AS order_month,
	COUNT(DISTINCT s.customerkey) AS total_unique_customers
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,total_unique_customers
0,2015-01-01 00:00:00-08:00,200
1,2015-02-01 00:00:00-08:00,291
2,2015-03-01 00:00:00-08:00,139
3,2015-04-01 00:00:00-07:00,78
4,2015-05-01 00:00:00-07:00,236
...,...,...
107,2023-12-01 00:00:00-08:00,1484
108,2024-01-01 00:00:00-08:00,1340
109,2024-02-01 00:00:00-08:00,1718
110,2024-03-01 00:00:00-08:00,877


In [13]:
%%sql

SELECT 
	DATE_TRUNC('month', s.orderdate) AS order_month,
	SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,total_sale_amount
0,2015-01-01 00:00:00-08:00,4.108292e+05
1,2015-02-01 00:00:00-08:00,7.558171e+05
2,2015-03-01 00:00:00-08:00,3.526008e+05
3,2015-04-01 00:00:00-07:00,1.712490e+05
4,2015-05-01 00:00:00-07:00,5.834638e+05
...,...,...
107,2023-12-01 00:00:00-08:00,3.113504e+06
108,2024-01-01 00:00:00-08:00,2.851227e+06
109,2024-02-01 00:00:00-08:00,3.759333e+06
110,2024-03-01 00:00:00-08:00,1.801386e+06


---
## TO_CHAR

### 📝 Notes

`TO_CHAR`

- **TO_CHAR** converts a date, time, or numeric value to a formatted string.
- Syntax: `TO_CHAR(value, 'format')` (e.g., `TO_CHAR(CURRENT_DATE, 'YYYY-MM-DD')` returns `2024-12-04`).

### 💻 Final Result

- Return total sales revenue aggregated by month, with a human-readable date format (e.g., `YYYY-MM`).

#### Format Date

**`TO_CHAR`**

1. Use `TO_CHAR` to return the total sales revenue by month.
    - Format `orderdate` into a `YYYY-MM` string representation using `TO_CHAR`.
    - Multiply `quantity` by `price` and `exchangerate` to calculate total revenue for each sale.
    - Aggregate sales revenue by the formatted string using `SUM()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [14]:
%%sql

SELECT 
	TO_CHAR(s.orderdate, 'YYYY-MM') AS order_year_month,
	SUM(s.quantity * s.unitprice * s.exchangerate) AS total_sale_amount
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year_month
ORDER BY
	order_year_month

Unnamed: 0,order_year_month,total_sale_amount
0,2015-01,4.108292e+05
1,2015-02,7.558171e+05
2,2015-03,3.526008e+05
3,2015-04,1.712490e+05
4,2015-05,5.834638e+05
...,...,...
107,2023-12,3.113504e+06
108,2024-01,2.851227e+06
109,2024-02,3.759333e+06
110,2024-03,1.801386e+06


**📊[Insert chart]📊**

In [15]:
%%sql

SELECT 
	TO_CHAR(s.orderdate, 'YYYY-MM') AS order_year_month,
	COUNT(DISTINCT s.customerkey) AS total_unique_customers
FROM sales s
	LEFT JOIN product p ON s.productkey = p.productkey
GROUP BY
	order_year_month
ORDER BY
	order_year_month

Unnamed: 0,order_year_month,total_unique_customers
0,2015-01,200
1,2015-02,291
2,2015-03,139
3,2015-04,78
4,2015-05,236
...,...,...
107,2023-12,1484
108,2024-01,1340
109,2024-02,1718
110,2024-03,877


**📊[Insert chart]📊**