## Aggregations using SQL Queries

Let us understand how to aggregate the data.
* We can perform global aggregations as well as aggregations by key. Here are most commonly used aggregate functions - `sum`, `avg`, `min`, `max`, `count`, etc.
* Global Aggregations
  * Get total number of orders.
  * Get revenue for a given order id.
  * Get number of records with `order_status` either `COMPLETE` or `CLOSED`.
* Aggregations by key - using `GROUP BY`
  * Get number of orders by date or status.
  * Get revenue for each `order_id`.
  * Get daily product revenue (using order date and product id as keys).
* We can also use `HAVING` clause to apply filtering on top of aggregated data.
  * Get daily product revenue where revenue is greater than $500 (using order date and product id as keys).
* Rules while using `GROUP BY`.
  * We can have the columns which are specified as part of `GROUP BY` in `SELECT` clause.
  * On top of those, we can have derived columns using aggregate functions.
  * We cannot have any other columns that are not used as part of `GROUP BY` or derived column using non aggregate functions.
  * We will not be able to use aggregate functions or aliases used in the select clause as part of the where clause.
  * If we want to filter based on aggregated results, then we can leverage `HAVING` on top of `GROUP BY` (specifying `WHERE` is not an option)
* Typical query execution - `FROM` -> `WHERE` -> `GROUP BY` -> `SELECT`

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_retail_user:itversity@localhost:5432/itversity_retail_db

In [None]:
%%sql

SELECT * FROM users

In [None]:
%%sql

SELECT count(*) AS total_count,
    count(user_password) AS user_password_count, -- counts only not null values
    count(DISTINCT is_active) AS is_active_count -- Distinct is_active count (2, as we have only true or false)
FROM users

In [None]:
%sql SELECT count(order_id) FROM orders

In [None]:
%sql SELECT count(DISTINCT order_date) FROM orders

In [None]:
%%sql 

SELECT count(*) AS total_count,
    count(DISTINCT order_id) AS distinct_order_id_count,
    count(DISTINCT order_date) AS distinct_order_date_count,
    count(DISTINCT order_customer_id) AS distinct_order_customer_count,
    count(DISTINCT order_status) AS distinct_order_status_count
FROM orders

In [None]:
%%sql

SELECT *
FROM order_items 
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT sum(order_item_subtotal) AS order_revenue
FROM order_items 
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT round(sum(order_item_subtotal::numeric), 2) AS order_revenue
FROM order_items 
WHERE order_item_order_id = 2

In [None]:
%%sql

SELECT count(*) 
FROM orders
WHERE order_status IN ('COMPLETE', 'CLOSED')

* Get number of orders by date or status.

In [None]:
%%sql

-- Get count by order_date
SELECT order_date,
    count(*) AS order_count
FROM orders
GROUP BY 1
ORDER BY 1
LIMIT 10


In [None]:
%%sql

SELECT count(*)
FROM (
    SELECT order_date,
        count(*) AS order_count
    FROM orders
    GROUP BY 1
) AS q

In [None]:
%%sql

-- Get count by order_status
SELECT order_status,
    count(*) AS order_count
FROM orders
GROUP BY 1
ORDER BY 1
LIMIT 10

In [None]:
%%sql

-- Get count by order month
SELECT to_char(order_date, 'yyyy-MM') AS order_month,
    count(*) AS order_count
FROM orders
GROUP BY 1
ORDER BY 1

* Get revenue for each order id from order_items

In [None]:
%%sql

SELECT * FROM order_items
ORDER BY order_item_order_id, order_item_id
LIMIT 25

In [None]:
%%sql

-- Get revenue for each order id from order_items
SELECT order_item_order_id,
    sum(order_item_subtotal) AS order_revenue
FROM order_items
GROUP BY 1
ORDER BY 1
LIMIT 10

This query using `round` will fail as `sum(order_item_subtotal)` will not return the data accepted by `round`. We have to convert the data type of `sum(order_item_subtotal)` to `numeric`.

In [None]:
%%sql

-- This fails
SELECT order_item_order_id,
    sum(order_item_subtotal) AS order_revenue
FROM order_items
GROUP BY 1
ORDER BY 1
LIMIT 10

In [None]:
%%sql

-- Using round so that the decimal points are rounded off to 2
SELECT order_item_order_id,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM order_items
GROUP BY 1
ORDER BY 1
LIMIT 10


Compute Daily Product Revenue
* Join `orders` and `order_items`
* Consider only orders with status `COMPLETE` or `CLOSED`
* Use orders order date and order items product id as grouping keys
* Sort the final output by date and then by revenue in descending order

In [None]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders AS o
    JOIN order_items AS oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY 1, 2
ORDER BY 1, 3 DESC
LIMIT 100

Compute Monthly Product Revenue
* Join `orders` and `order_items`
* Consider only orders with status `COMPLETE` or `CLOSED`
* Use orders order month and order items product id as grouping keys
* Sort the final output by month and then by revenue in descending order

In [None]:
%%sql

SELECT to_char(o.order_date, 'yyyy-MM') AS order_month,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders AS o
    JOIN order_items AS oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY 1, 2
ORDER BY 1, 3 DESC
LIMIT 100

Get revenue for each order where revenue is greater than 500.

* We cannot use the aliases in select clause in `WHERE`. In this case **revenue** cannot be used in `WHERE` clause.

In [None]:
%%sql

SELECT order_item_order_id,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM order_items
GROUP BY 1
ORDER BY 1
LIMIT 10

In [None]:
%%sql

-- This will fail

SELECT order_item_order_id,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM order_items
WHERE order_revenue > 500
GROUP BY 1
ORDER BY 1
LIMIT 10

We cannot use aggregate functions in `WHERE` clause.

In [None]:
%%sql

-- This will also fail

SELECT order_item_order_id,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM order_items
WHERE round(sum(order_item_subtotal)::numeric, 2) > 500
GROUP BY 1
ORDER BY 1
LIMIT 10

In [None]:
%%sql

-- Filter based on aggregated results using HAVING
-- We can use aggregate funtion in HAVING

SELECT order_item_order_id,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM order_items
GROUP BY 1
    HAVING round(sum(order_item_subtotal)::numeric, 2) > 500
ORDER BY 1
LIMIT 10

In [None]:
%%sql

-- Filter based on aggregated result using Inner Query
SELECT * FROM (
    SELECT order_item_order_id,
        round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
    FROM order_items
    GROUP BY 1
) AS revenue_per_order
WHERE order_revenue > 500
ORDER BY 1
LIMIT 10

In [None]:
%%sql

-- Filter based on aggregated results using CTE
WITH revenue_per_order_cte AS (
    SELECT order_item_order_id,
        round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
    FROM order_items
    GROUP BY 1    
) SELECT * FROM revenue_per_order_cte
WHERE order_revenue > 500
LIMIT 10

In [None]:
%%sql

-- Another Example: Filter based on aggregated results using HAVING
-- Filter for those daily products whose revenue is greater than 5000

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
FROM orders AS o
    JOIN order_items AS oi
        ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY 1, 2
    HAVING round(sum(oi.order_item_subtotal)::numeric, 2) > 5000
ORDER BY 1, 3 DESC
LIMIT 100

In [None]:
%%sql

-- Get count of all daily product revenue records

SELECT count(*) FROM (
    SELECT o.order_date,
        oi.order_item_product_id,
        round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
    FROM orders AS o
        JOIN order_items AS oi
            ON o.order_id = oi.order_item_order_id
    WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    GROUP BY 1, 2
) AS q

In [None]:
%%sql

-- Get count of daily product revenue records
-- where revenue is greater than 5000
SELECT count(*) FROM (
    SELECT o.order_date,
        oi.order_item_product_id,
        round(sum(oi.order_item_subtotal)::numeric, 2) AS revenue
    FROM orders AS o
        JOIN order_items AS oi
            ON o.order_id = oi.order_item_order_id
    WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    GROUP BY 1, 2
        HAVING round(sum(oi.order_item_subtotal)::numeric, 2) > 5000
) AS q