<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Calculations/1_Date_Format.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date Format

## Overview

### 🥅 Analysis Goals

The analysis will focus on understanding sales (net revenue) trends over time:

- **Summarize net revenue by month**: Use precise date truncation to aggregate sales data by month.
- **Create human-readable monthly sales summaries**: Use `TO_CHAR()` to format dates for reporting purposes.


**📊[Insert chart]📊**

### 📘 Concepts Covered

Date formatting:
- `DATE_TRUNC()`
- `TO_CHAR()`

---

In [2]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the ipython-sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---
## DATE_TRUNC

### 📝 Notes

`DATE_TRUNC`

- **DATE_TRUNC** truncates a timestamp to a specified level of precision (e.g., year, month, day, hour).

- Syntax: 

  ```sql
  DATE_TRUNC('precision', timestamp)
  ```

- Example: 
  ```sql
  DATE_TRUNC('month', '2024-12-04 10:15:30')
  ```
### 💻 Final Result

- Summarize total net revenue by month using DATE_TRUNC, allowing for consistent time-based aggregation.

#### Truncate Date

**`DATE_TRUNC`**

1. Use `DATE_TRUNC` to return the total net revenue by month.
    - Truncate `orderdate` to the first day of each month using `DATE_TRUNC`.
    - Multiply `quantity` by `netprice` and `exchangerate` to calculate the total net revenue.
    - Aggregate net revenue by month using `SUM()`.
    - Use `GROUP BY` on the truncated month to perform the aggregation.
    - Sort the result by month for chronological order.

In [3]:
%%sql

SELECT 
	DATE_TRUNC('month', s.orderdate) AS order_month,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,net_revenue
0,2015-01-01 00:00:00-08:00,384092.66
1,2015-02-01 00:00:00-08:00,706374.12
2,2015-03-01 00:00:00-08:00,332961.59
3,2015-04-01 00:00:00-07:00,160767.00
4,2015-05-01 00:00:00-07:00,548632.63
...,...,...
107,2023-12-01 00:00:00-08:00,2928550.93
108,2024-01-01 00:00:00-08:00,2677498.55
109,2024-02-01 00:00:00-08:00,3542322.55
110,2024-03-01 00:00:00-08:00,1692854.89


2. Use `DATE_TRUNC` to return the total unique customers by month.
    - Truncate `orderdate` to the first day of each month using `DATE_TRUNC`.
    - Aggregate unique customers by the formatted string using `COUNT()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [4]:
%%sql

SELECT 
	DATE_TRUNC('month', s.orderdate) AS order_month,
	COUNT(DISTINCT s.customerkey) AS total_unique_customers
FROM sales s
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,total_unique_customers
0,2015-01-01 00:00:00-08:00,200
1,2015-02-01 00:00:00-08:00,291
2,2015-03-01 00:00:00-08:00,139
3,2015-04-01 00:00:00-07:00,78
4,2015-05-01 00:00:00-07:00,236
...,...,...
107,2023-12-01 00:00:00-08:00,1484
108,2024-01-01 00:00:00-08:00,1340
109,2024-02-01 00:00:00-08:00,1718
110,2024-03-01 00:00:00-08:00,877


---
## TO_CHAR

### 📝 Notes

`TO_CHAR`

- **TO_CHAR** converts a date, time, or numeric value to a formatted string.

- Syntax: 
  ```sql
  TO_CHAR(value, 'format')
  ``` 

- Example: 
  ```sql
  TO_CHAR(CURRENT_DATE, 'YYYY-MM-DD')
  ```


### 💻 Final Result

- Convert dates into a human-readable `YYYY-MM` format using `TO_CHAR`, enabling clear and concise monthly reporting.

#### Format Date

**`TO_CHAR`**

1. Use `TO_CHAR` to return the total net revenue by month.
    - Format `orderdate` into a `YYYY-MM` string representation using `TO_CHAR`.
    - Multiply `quantity` by `netprice` and `exchangerate` to calculate total net revenue for each sale.
    - Aggregate net revenue by the formatted string using `SUM()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [6]:
%%sql

SELECT 
	TO_CHAR(s.orderdate, 'YYYY-MM') AS order_year_month,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
GROUP BY
	order_year_month
ORDER BY
	order_year_month

Unnamed: 0,order_year_month,net_revenue
0,2015-01,384092.66
1,2015-02,706374.12
2,2015-03,332961.59
3,2015-04,160767.00
4,2015-05,548632.63
...,...,...
107,2023-12,2928550.93
108,2024-01,2677498.55
109,2024-02,3542322.55
110,2024-03,1692854.89


**📊[Insert chart]📊**

2. Use `TO_CHAR` to return the total unique customers by month.
    - Format `orderdate` into a `YYYY-MM` string representation using `TO_CHAR`.
    - Aggregate unique customers by the formatted string using `COUNT()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [7]:
%%sql

SELECT 
	TO_CHAR(s.orderdate, 'YYYY-MM') AS order_year_month,
	COUNT(DISTINCT s.customerkey) AS total_unique_customers
FROM sales s
GROUP BY
	order_year_month
ORDER BY
	order_year_month

Unnamed: 0,order_year_month,total_unique_customers
0,2015-01,200
1,2015-02,291
2,2015-03,139
3,2015-04,78
4,2015-05,236
...,...,...
107,2023-12,1484
108,2024-01,1340
109,2024-02,1718
110,2024-03,877


**📊[Insert chart]📊**