# Load SQL and Connect to DB

In [None]:
%load_ext sql

**Connect to DB**

In [None]:
%env DATABASE_URL = postgresql://shubham_sms_user:shubham@172.25.87.65:5432/shubham_sms_db

# Performing Aggregation

**Let us uderstand how to aggregate the data**

- We can Perform global aggergation as well as aggregation by key.
- Typical query execution -`FROM -> WHERE -> GROUP BY -> SELECT -> HAVING`

## Global Aggregation
   

1. Get total Number of orders.

In [3]:
%%sql 

select count(order_id) from shubham.orders;

1 rows affected.


count
68883


2. Get revenue for a given order id.
    - Check the data for order id 2 
    - Perform aggregation based on the details.

In [4]:
%%sql 

select * from shubham.order_items
    where order_item_order_id = 2;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
3 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
2,2,1073,1,199.99,199.99
3,2,502,5,250.0,50.0
4,2,403,1,129.99,129.99


In [5]:
%%sql 

    select round(sum(order_item_subtotal::numeric), 2) as Revenue
        from shubham.order_items
            where order_item_order_id = 2 ;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


revenue
579.98


3. Get number of records with order_status either COMPLETED or CLOSED.
     - See the values of order_status.
     - Count the required status.

In [6]:
%%sql

    select distinct(order_status) as Status_Count
        from shubham.orders;
           

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
9 rows affected.


status_count
COMPLETE
ON_HOLD
PENDING_PAYMENT
PENDING
CLOSED
CANCELED
PROCESSING
PAYMENT_REVIEW
SUSPECTED_FRAUD


In [7]:
%%sql

    select count(*) as Status_Count
        from shubham.orders
            where order_status in ('COMPLETE','CLOSED');

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
1 rows affected.


status_count
30455


## Aggregation by key - using **GROUP BY** 

- Rules while using **GROUP BY**.
    - We can have the columns which are specified as part of **GROUP BY** in  **SELECT** clause.
    - On top of those, we can have derived columns using aggregate functions.
    - We can hve any other columns that are nit used as pwrt of **GROUP BY** or derived column using non aggregated functions.
    - We will not be able to use aggregated functions or aliases used in the select clause as part of the where clause.
    - if we want to filter based on aggregated results, then we can leverage **HAVING** on top of **GROUP BY** (specifying **WHERE** is not an option).

1. Get number of orders by date or status.
    - Check the data for refrence
    - apply the aggregation

In [8]:
%%sql 
    select * from shubham.orders limit 2;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
2 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT


In [9]:
%%sql 

    select order_date, count (*)
        from shubham.orders
            group by order_date 
                order by order_date
                    limit 5;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
5 rows affected.


order_date,count
2013-07-25 00:00:00,143
2013-07-26 00:00:00,269
2013-07-27 00:00:00,202
2013-07-28 00:00:00,187
2013-07-29 00:00:00,253


 2. Get revenue for each order_id.
     - Check the data for refrence. 
     - apply the aggregation.

In [10]:
%%sql 

    select * from shubham.order_items limit 5;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
5 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
1,1,957,1,299.98,299.98
2,2,1073,1,199.99,199.99
3,2,502,5,250.0,50.0
4,2,403,1,129.99,129.99
5,4,897,2,49.98,24.99


In [11]:
%%sql

    select order_item_order_id , round( sum(order_item_subtotal :: numeric),2 ) as revenue
        from shubham.order_items
            group by order_item_order_id
                order by order_item_order_id 
                    limit 5;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
5 rows affected.


order_item_order_id,revenue
1,299.98
2,579.98
4,699.85
5,1129.86
7,579.92


3.  Get daily product revenue (using order date and product id as keys).
     - Check the data for refrence. 
     - apply the aggregation.

In [15]:
%%sql 
    
    select * from shubham.order_items limit 10;
  

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
10 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
1,1,957,1,299.98,299.98
2,2,1073,1,199.99,199.99
3,2,502,5,250.0,50.0
4,2,403,1,129.99,129.99
5,4,897,2,49.98,24.99
6,4,365,5,299.95,59.99
7,4,502,3,150.0,50.0
8,4,1014,4,199.92,49.98
9,5,957,1,299.98,299.98
10,5,365,5,299.95,59.99


In [14]:
%%sql 


  select * from shubham.orders limit 10;

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
8,2013-07-25 00:00:00,2911,PROCESSING
9,2013-07-25 00:00:00,5657,PENDING_PAYMENT
10,2013-07-25 00:00:00,5648,PENDING_PAYMENT


In [21]:
%%sql 

    select o.order_date,
           oi.order_item_product_id,
           round (sum(oi.order_item_subtotal:: numeric),2 ) as revenue
                from shubham.orders as o
                    join shubham.order_items as oi
                        on o.order_id = oi.order_item_order_id
                where o.order_status in ('COMPLETE','CLOSED')
                    group by o.order_date, oi.order_item_product_id
                    order by o.order_date, oi.order_item_product_id
                        limit 10 
 

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
10 rows affected.


order_date,order_item_product_id,revenue
2013-07-25 00:00:00,24,319.96
2013-07-25 00:00:00,93,74.97
2013-07-25 00:00:00,134,100.0
2013-07-25 00:00:00,191,5099.49
2013-07-25 00:00:00,226,599.99
2013-07-25 00:00:00,365,3359.44
2013-07-25 00:00:00,403,1949.85
2013-07-25 00:00:00,502,1650.0
2013-07-25 00:00:00,572,119.97
2013-07-25 00:00:00,625,199.99


### We can also use **HAVING** clause to apply filtering on top of aggregated data.
    - Get daily product revenue where revenue is grater than `$500` (using order date and product id as keys)

In [22]:
%%sql 

    select o.order_date,
           oi.order_item_product_id,
           round (sum(oi.order_item_subtotal:: numeric),2 ) as revenue
                from shubham.orders as o
                    join shubham.order_items as oi
                        on o.order_id = oi.order_item_order_id
                where o.order_status in ('COMPLETE','CLOSED')
                    group by o.order_date, oi.order_item_product_id
                    having  round (sum(oi.order_item_subtotal:: numeric),2 )>= 500
                    order by o.order_date, oi.order_item_product_id, revenue DESC
                        limit 10 
 

 * postgresql://shubham_sms_user:***@172.25.87.65:5432/shubham_sms_db
10 rows affected.


order_date,order_item_product_id,revenue
2013-07-25 00:00:00,191,5099.49
2013-07-25 00:00:00,226,599.99
2013-07-25 00:00:00,365,3359.44
2013-07-25 00:00:00,403,1949.85
2013-07-25 00:00:00,502,1650.0
2013-07-25 00:00:00,627,1079.73
2013-07-25 00:00:00,957,4499.7
2013-07-25 00:00:00,1004,5599.72
2013-07-25 00:00:00,1014,2798.88
2013-07-25 00:00:00,1073,2999.85
