<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/2_Date_Time/1_Date_Format.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Date & Time Formatting

## Overview

### 🥅 Analysis Goals

- **Summarize net revenue by month**: Use precise date truncation to aggregate sales data by month.
- **Create human-readable monthly sales summaries**: Use `TO_CHAR()` to format dates for reporting purposes.

### 📘 Concepts Covered

Date formatting:
- `DATE_TRUNC()`
- `TO_CHAR()`

[Source Documentation on Date/Time Functions.](https://www.postgresql.org/docs/current/functions-datetime.html)

### 📕 Definitions

- **Time series analysis** - studies how data changes over time to find patterns or make predictions.
    - One of the most common types of analysis
    - Examples: daily temperature, number of daily steps recorded by your fitness tracker

---

In [1]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Update package installer
    !sudo apt-get update -qq > /dev/null 2>&1

    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

---
## DATE_TRUNC

### 📝 Notes

`DATE_TRUNC`

- **DATE_TRUNC** truncates a timestamp to a specified level of precision (e.g., year, month, day, hour).

- Syntax:

  ```sql
  DATE_TRUNC('precision', timestamp)
  ```

- Example:

Convert `orderdate` to month using `DATE_TRUNC`.

In [None]:
%%sql

SELECT
	orderdate,
	DATE_TRUNC('month', orderdate) AS order_month
FROM sales
LIMIT 10;

Unnamed: 0,orderdate,order_month
0,2015-01-01,2015-01-01 00:00:00-06:00
1,2015-01-01,2015-01-01 00:00:00-06:00
2,2015-01-01,2015-01-01 00:00:00-06:00
3,2015-01-01,2015-01-01 00:00:00-06:00
4,2015-01-01,2015-01-01 00:00:00-06:00
5,2015-01-01,2015-01-01 00:00:00-06:00
6,2015-01-01,2015-01-01 00:00:00-06:00
7,2015-01-01,2015-01-01 00:00:00-06:00
8,2015-01-01,2015-01-01 00:00:00-06:00
9,2015-01-01,2015-01-01 00:00:00-06:00


Cast from timestamp to date using `::date` caster.

In [None]:
%%sql

SELECT
	orderdate,
	DATE_TRUNC('month', orderdate)::date AS order_month  -- cast to date
FROM sales
ORDER BY RANDOM()  -- get random rows
LIMIT 10;

Unnamed: 0,orderdate,order_month
0,2021-07-27,2021-07-01
1,2019-10-10,2019-10-01
2,2017-11-29,2017-11-01
3,2024-01-24,2024-01-01
4,2021-05-12,2021-05-01
5,2017-06-01,2017-06-01
6,2019-12-03,2019-12-01
7,2024-02-17,2024-02-01
8,2018-02-05,2018-02-01
9,2018-12-13,2018-12-01


### 📈 Analysis

Calculate the net revenue and unique customers by month.

#### Net Revenue by Month

**`DATE_TRUNC`**

1. Use `DATE_TRUNC` to return the total net revenue by month.
    - Truncate `orderdate` to the first day of each month using `DATE_TRUNC`.
    - Multiply `quantity` by `netprice` and `exchangerate` to calculate the total net revenue.
    - Aggregate net revenue by month using `SUM()`.
    - Use `GROUP BY` on the truncated month to perform the aggregation.
    - Sort the result by month for chronological order.

In [None]:
%%sql

SELECT
	DATE_TRUNC('month', s.orderdate)::date AS order_month,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,net_revenue
0,2015-01-01,384092.66
1,2015-02-01,706374.12
2,2015-03-01,332961.59
3,2015-04-01,160767.00
4,2015-05-01,548632.63
...,...,...
107,2023-12-01,2928550.93
108,2024-01-01,2677498.55
109,2024-02-01,3542322.55
110,2024-03-01,1692854.89


2. Use `DATE_TRUNC` to return the total unique customers by month.
    - Truncate `orderdate` to the first day of each month using `DATE_TRUNC`.
    - 🔔 Aggregate unique customers by the formatted string using `COUNT()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [None]:
%%sql

SELECT
	DATE_TRUNC('month', s.orderdate)::date AS order_month,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue,
	COUNT(DISTINCT s.customerkey) AS total_unique_customers
FROM sales s
GROUP BY
	order_month
ORDER BY
	order_month

Unnamed: 0,order_month,net_revenue,total_unique_customers
0,2015-01-01,384092.66,200
1,2015-02-01,706374.12,291
2,2015-03-01,332961.59,139
3,2015-04-01,160767.00,78
4,2015-05-01,548632.63,236
...,...,...,...
107,2023-12-01,2928550.93,1484
108,2024-01-01,2677498.55,1340
109,2024-02-01,3542322.55,1718
110,2024-03-01,1692854.89,877


---
## TO_CHAR

### 📝 Notes

`TO_CHAR`

- **TO_CHAR** converts a date, time, or numeric value to a formatted string.

- Syntax:
  ```sql
  TO_CHAR(value, 'format')
  ```
  - Different formats:
    - `YYYY-MM-DD`
    - `YYYY-MM`
    - `YYYY-MM-DD HH24:MI:SS`
    - `YYYY-MM-DD HH24:MI`
    - `YYYY-MM-DD HH24`
    - `YYYY-MM-DD`

- Example:

In [None]:
%%sql

SELECT
	orderdate,
	TO_CHAR(orderdate, 'YYYY-MM') AS order_year_month
FROM sales
ORDER BY RANDOM()  -- get random rows
LIMIT 10;

Unnamed: 0,orderdate,order_year_month
0,2022-10-03,2022-10
1,2023-04-01,2023-04
2,2019-02-09,2019-02
3,2022-06-08,2022-06
4,2022-05-04,2022-05
5,2022-12-08,2022-12
6,2022-06-15,2022-06
7,2023-05-08,2023-05
8,2022-06-22,2022-06
9,2016-02-26,2016-02


### 📈 Analysis

- Convert dates into a readable `YYYY-MM` format using `TO_CHAR`, to create a clear and concise monthly report.

#### Monthly Net Revenue

**`TO_CHAR`**

1. Use `TO_CHAR` to return the total net revenue by month.
    - Format `orderdate` into a `YYYY-MM` string representation using `TO_CHAR`.
    - Multiply `quantity` by `netprice` and `exchangerate` to calculate total net revenue for each sale.
    - Aggregate net revenue by the formatted string using `SUM()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [None]:
%%sql

SELECT
	TO_CHAR(s.orderdate, 'YYYY-MM') AS order_year_month,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue
FROM sales s
GROUP BY
	order_year_month
ORDER BY
	order_year_month

Unnamed: 0,order_year_month,net_revenue
0,2015-01,384092.66
1,2015-02,706374.12
2,2015-03,332961.59
3,2015-04,160767.00
4,2015-05,548632.63
...,...,...
107,2023-12,2928550.93
108,2024-01,2677498.55
109,2024-02,3542322.55
110,2024-03,1692854.89


<img src="https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/Resources/images/2.1_monthly_rev.png?raw=1" alt="Revenue" width="50%">

2. Use `TO_CHAR` to return the total unique customers by month.
    - Format `orderdate` into a `YYYY-MM` string representation using `TO_CHAR`.
    - Multiply `quantity` by `netprice` and `exchangerate` to calculate total net revenue for each sale.
    - Aggregate net revenue by the formatted string using `SUM()`.
    - 🔔 Aggregate unique customers by the formatted string using `COUNT()`.
    - Use `GROUP BY` on the formatted month to perform the aggregation.
    - Sort the result by the formatted month string for chronological order.

In [None]:
%%sql

SELECT
	TO_CHAR(s.orderdate, 'YYYY-MM') AS order_year_month,
	SUM(s.quantity * s.netprice * s.exchangerate) AS net_revenue,
	COUNT(DISTINCT s.customerkey) AS total_unique_customers
FROM sales s
GROUP BY
	order_year_month
ORDER BY
	order_year_month

Unnamed: 0,order_year_month,net_revenue,total_unique_customers
0,2015-01,384092.66,200
1,2015-02,706374.12,291
2,2015-03,332961.59,139
3,2015-04,160767.00,78
4,2015-05,548632.63,236
...,...,...,...
107,2023-12,2928550.93,1484
108,2024-01,2677498.55,1340
109,2024-02,3542322.55,1718
110,2024-03,1692854.89,877


<img src="https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/Resources/images/2.1_monthly_customers.png?raw=1" alt="Customers" width="50%">◊

In [4]:
##Quarterly Sales Quantity (2.1.1) - Problem
##Calculate the total quantity of products sold each quarter from the sales table. Use DATE_TRUNC to group the sales data by quarter and order the results by quarter.

%%sql

SELECT
  DATE_TRUNC ('quarter', orderdate):: DATE as orders_by_quarter,
  SUM (quantity) as sold_quantity
FROM
  sales
GROUP BY orders_by_quarter
ORDER BY orders_by_quarter

Unnamed: 0,orders_by_quarter,sold_quantity
0,2015-01-01,4493
1,2015-04-01,4071
2,2015-07-01,5766
3,2015-10-01,7261
4,2016-01-01,7158
5,2016-04-01,5715
6,2016-07-01,6203
7,2016-10-01,7793
8,2017-01-01,7745
9,2017-04-01,6084


In [9]:
##Weekly Net Revenue (2.1.2) - Problem
##Calculate the total net revenue sold each week in 2023 from the sales table. Use TO_CHAR to group the sales data by week and use TO_CHAR to filter for the year '2023'.


%%sql

SELECT
  TO_CHAR (orderdate, 'YYYY-WW') as order_week,
  SUM (netprice * quantity * exchangerate) as revenue
FROM sales
WHERE TO_CHAR (orderdate, 'YYYY') = '2023'
GROUP BY order_week
ORDER BY order_week

Unnamed: 0,order_week,revenue
0,2023-01,1118860.15
1,2023-02,773467.25
2,2023-03,797088.74
3,2023-04,782617.25
4,2023-05,717966.27
5,2023-06,736953.05
6,2023-07,1306411.98
7,2023-08,1565117.98
8,2023-09,855867.58
9,2023-10,674366.67


In [25]:
## Weekly Median Quantity (2.1.3) - Problem
##Calculate the median quantity of products sold each week in 2023 from the sales table. Use DATE_TRUNC to group the sales data by week and use DATE_TRUNC to filter for the year 2023.

%%sql

SELECT
  DATE_TRUNC ('week', orderdate) :: DATE as weekly_orders,
  PERCENTILE_CONT (0.5) WITHIN GROUP (ORDER BY quantity) as median_quantity
FROM sales
WHERE DATE_TRUNC ('week', orderdate) :: DATE BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY weekly_orders
ORDER BY weekly_orders

Unnamed: 0,weekly_orders,median_quantity
0,2023-01-02,3.0
1,2023-01-09,2.0
2,2023-01-16,3.0
3,2023-01-23,3.0
4,2023-01-30,2.0
5,2023-02-06,3.0
6,2023-02-13,2.0
7,2023-02-20,3.0
8,2023-02-27,2.0
9,2023-03-06,2.0
