# Writing Basic SQL Queries

As part of this section we will primarily focus on writing basic queries.

* Standard Transformations
* Overview of Data Model
* Define Problem Statement – Daily Product Revenue
* Preparing Tables
* Selecting or Projecting Data
* Filtering Data
* Joining Tables – Inner
* Joining Tables – Outer
* Performing Aggregations
* Sorting Data
* Solution – Daily Product Revenue

Here are the key objectives for this section
* What are different standard transformations and how they are implemented using Basic SQL?
* Understand the data model using which basic SQL features are explored?
* Setup the database, tables and load the data quickly
* How we typically select or project the data, filter the data, join data from multiple tables, compute metrics using aggregate functions, sort the data etc?
* While exploring basic SQL queries, we will define a problem statement and come up with a solution at the end.
* Self evaluate whether one understood all the key aspects of writing basic SQL queries using exercises at the end.

## Standard Transformations

Here are some of the transformations we typically perform on regular basis.

In [2]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Y8krusDetoQ?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Projection of data
* Filtering data
* Performing Aggregations
* Joins
* Sorting
* Ranking (will be covered as part of advanced queries)

## Overview of Data Model

We will be using retail data model for this section. It contains 6 tables.

In [3]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/g5HfliqD-a0?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Table list
  * orders
  * order_items
  * products
  * categories
  * departments
  * customers
* **orders** and **order_items** are transactional tables.
* **products**, **categories** and **departments** are non transactional tables which have data related to product catalog.
* **customers** is a non transactional table which have customer details.
* There is 1 to many relationship between **orders** and **order_items**.
* There is 1 to many relationship between **products** and **order_items**. Each order item will have one product and product can be part of many order_items.
* There is 1 to many relationship between **customers** and **orders**. A customer can place many orders over a period of time but there cannot be more than one customer for a given order.
* There is 1 to many relationship between **departments** and **categories**. Also there is 1 to many relationship between **categories** and **products**.
* There is hierarchical relationship from departments to products - **departments** -> **categories** -> **products**

## Define Problem Statement – Daily Product Revenue

Let us try to get daily product revenue using retail tables.

In [4]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/fkFaPYWfjv4?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* daily is derived from orders.order_date.
* product has to be derived from products.product_name.
* revenue has to be derived from order_items.order_item_subtotal.
* We need to join all the 3 tables, then group by order_date, product_id as well as product_name to get revenue using order_item_subtotal.
* Get Daily Product Revenue using products, orders and order_items data set.
* We have following fields in **orders**.
  * order_id
  * order_date
  * order_customer_id
  * order_status
* We have following fields in **order_items**.
  * order_item_id
  * order_item_order_id
  * order_item_product_id
  * order_item_quantity
  * order_item_subtotal
  * order_item_product_price
* We have following fields in **products**
  * product_id
  * product_category_id
  * product_name
  * product_description
  * product_price
  * product_image
* We have one to many relationship between orders and order_items.
* **orders.order_id** is **primary key** and **order_items.order_item_order_id** is foreign key to **orders.order_id**.
* We have one to many relationship between products and order_items.
* **products.product_id** is **primary key** and **order_items.order_item_product_id** is foreign key to **products.product_id**
* By the end of this module we will explore all standard transformations and get daily product revenue using following fields.
  * **orders.order_date**
  * **order_items.order_item_product_id**
  * **products.product_name**
  * **order_items.order_item_subtotal** (aggregated using date and product_id).
* We will consider only **COMPLETE** or **CLOSED** orders.
* As there can be more than one product names with different ids, we have to include product_id as part of the key using which we will group the data.

## Preparing Tables

Let us prepare retail tables to come up with the solution for the problem statement.

In [5]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/yeVIRyGyv7g?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Ensure that we have required database and user for retail data. We might provide the database as part of our labs. Here are the instructions to use `psql` for setting up the required tables.

```shell
psql -U postgres -h localhost -p 5432 -W
```

```sql
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
```

* Create Tables using the script provided. You can either use `psql` or **SQL Alchemy**.

```shell
psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W

\i /data/retail_db/create_db_tables_pg.sql
```

* Data shall be loaded using the script provided.

```shell
\i /data/retail_db/load_db_tables_pg.sql
```

* Run queries to validate we have data in all the 3 tables.

In [4]:
%load_ext sql

In [5]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [6]:
%sql SELECT current_database()

1 rows affected.


current_database
itversity_retail_db


In [7]:
%%sql

SELECT * FROM information_schema.tables 
WHERE table_catalog = 'itversity_retail_db' 
    AND table_schema = 'public' 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
6 rows affected.


table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
itversity_retail_db,public,categories,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,departments,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,products,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,customers,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,orders,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,order_items,BASE TABLE,,,,,,YES,NO,


In [8]:
%sql SELECT * FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
8,2013-07-25 00:00:00,2911,PROCESSING
9,2013-07-25 00:00:00,5657,PENDING_PAYMENT
10,2013-07-25 00:00:00,5648,PENDING_PAYMENT


In [9]:
%sql SELECT * FROM order_items LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
1,1,957,1,299.98,299.98
2,2,1073,1,199.99,199.99
3,2,502,5,250.0,50.0
4,2,403,1,129.99,129.99
5,4,897,2,49.98,24.99
6,4,365,5,299.95,59.99
7,4,502,3,150.0,50.0
8,4,1014,4,199.92,49.98
9,5,957,1,299.98,299.98
10,5,365,5,299.95,59.99


In [10]:
%sql SELECT * FROM products LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


product_id,product_category_id,product_name,product_description,product_price,product_image
1,2,Quest Q64 10 FT. x 10 FT. Slant Leg Instant U,,59.98,http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy
2,2,Under Armour Men's Highlight MC Football Clea,,129.99,http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
3,2,Under Armour Men's Renegade D Mid Football Cl,,89.99,http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
4,2,Under Armour Men's Renegade D Mid Football Cl,,89.99,http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
5,2,Riddell Youth Revolution Speed Custom Footbal,,199.99,http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet
6,2,Jordan Men's VI Retro TD Football Cleat,,134.99,http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat
7,2,Schutt Youth Recruit Hybrid Custom Football H,,99.99,http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014
8,2,Nike Men's Vapor Carbon Elite TD Football Cle,,129.99,http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat
9,2,Nike Adult Vapor Jet 3.0 Receiver Gloves,,50.0,http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves
10,2,Under Armour Men's Highlight MC Football Clea,,129.99,http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat


In [11]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
68883


In [12]:
%sql SELECT count(1) FROM order_items

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
172198


In [13]:
%sql SELECT count(1) FROM products

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
1345


## Selecting or Projecting Data

Let us understand different aspects of projecting data. We primarily using `SELECT` to project the data.

In [6]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/0eSWEBDf23A?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We can project all columns using `*` or some columns using column names.
* We can provide aliases to a column or expression using `AS` in `SELECT` clause.
* `DISTINCT` can be used to get the distinct records from selected columns. We can also use `DISTINCT *` to get unique records using all the columns.
* As part of `SELECT` clause we can have aggregate functions such as `count`, `sum` etc.

In [14]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [15]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [16]:
%sql SELECT * FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
8,2013-07-25 00:00:00,2911,PROCESSING
9,2013-07-25 00:00:00,5657,PENDING_PAYMENT
10,2013-07-25 00:00:00,5648,PENDING_PAYMENT


In [17]:
%%sql 

SELECT * FROM information_schema.columns 
WHERE table_catalog = 'itversity_retail_db' 
    AND table_name = 'orders'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
4 rows affected.


table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
itversity_retail_db,public,orders,order_id,1,,NO,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,itversity_retail_db,pg_catalog,int4,,,,,1,NO,NO,,,,,,NO,NEVER,,YES
itversity_retail_db,public,orders,order_date,2,,NO,timestamp without time zone,,,,,,6.0,,,,,,,,,,,,itversity_retail_db,pg_catalog,timestamp,,,,,2,NO,NO,,,,,,NO,NEVER,,YES
itversity_retail_db,public,orders,order_customer_id,3,,NO,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,itversity_retail_db,pg_catalog,int4,,,,,3,NO,NO,,,,,,NO,NEVER,,YES
itversity_retail_db,public,orders,order_status,4,,NO,character varying,45.0,180.0,,,,,,,,,,,,,,,,itversity_retail_db,pg_catalog,varchar,,,,,4,NO,NO,,,,,,NO,NEVER,,YES


In [18]:
%%sql 

SELECT order_customer_id, order_date, order_status 
FROM orders 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_customer_id,order_date,order_status
11599,2013-07-25 00:00:00,CLOSED
256,2013-07-25 00:00:00,PENDING_PAYMENT
12111,2013-07-25 00:00:00,COMPLETE
8827,2013-07-25 00:00:00,CLOSED
11318,2013-07-25 00:00:00,COMPLETE
7130,2013-07-25 00:00:00,COMPLETE
4530,2013-07-25 00:00:00,COMPLETE
2911,2013-07-25 00:00:00,PROCESSING
5657,2013-07-25 00:00:00,PENDING_PAYMENT
5648,2013-07-25 00:00:00,PENDING_PAYMENT


In [19]:
%%sql 

SELECT order_customer_id, 
    to_char(order_date, 'yyyy-MM'), 
    order_status 
FROM orders 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_customer_id,to_char,order_status
11599,2013-07,CLOSED
256,2013-07,PENDING_PAYMENT
12111,2013-07,COMPLETE
8827,2013-07,CLOSED
11318,2013-07,COMPLETE
7130,2013-07,COMPLETE
4530,2013-07,COMPLETE
2911,2013-07,PROCESSING
5657,2013-07,PENDING_PAYMENT
5648,2013-07,PENDING_PAYMENT


In [20]:
%%sql 

SELECT order_customer_id, 
    to_char(order_date, 'yyyy-MM') AS order_month, 
    order_status 
FROM orders 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_customer_id,order_month,order_status
11599,2013-07,CLOSED
256,2013-07,PENDING_PAYMENT
12111,2013-07,COMPLETE
8827,2013-07,CLOSED
11318,2013-07,COMPLETE
7130,2013-07,COMPLETE
4530,2013-07,COMPLETE
2911,2013-07,PROCESSING
5657,2013-07,PENDING_PAYMENT
5648,2013-07,PENDING_PAYMENT


In [21]:
%%sql 

SELECT DISTINCT to_char(order_date, 'yyyy-MM') AS order_month 
FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
13 rows affected.


order_month
2014-01
2014-05
2013-12
2013-11
2014-04
2014-07
2014-03
2013-08
2013-10
2013-07


In [22]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
68883


In [23]:
%%sql 

SELECT count(DISTINCT to_char(order_date, 'yyyy-MM')) AS distinct_month_count 
FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


distinct_month_count
13


## Filtering Data

Let us understand how we can filter the data as part of our queries.

In [7]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/2DrLVbXd0Jo?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We use `WHERE` clause to filter the data.
* All comparison operators such as `=`, `!=`, `>`, `<`, `<=`, `>=` etc can be used to compare a column or expression or literal with another column or expression or literal.
* We can use operators such as `LIKE` with `%` or `~` with regular expressions for pattern matching.
* Boolean `OR` and `AND` can be performed when we want to apply multiple conditions.
  * Get all orders with order_status equals to COMPLETE or CLOSED. We can also use IN operator.
  * Get all orders from month 2014 January with order_status equals to COMPLETE or CLOSED
* We can also use `BETWEEN` along with `AND` to compare a column or expression against range of values.
* We need to use `IS NULL` and `IS NOT NULL` to compare against null values.

In [24]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [25]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [26]:
%%sql 

SELECT * FROM orders 
WHERE order_status = 'COMPLETE' 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
3,2013-07-25 00:00:00,12111,COMPLETE
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
15,2013-07-25 00:00:00,2568,COMPLETE
17,2013-07-25 00:00:00,2667,COMPLETE
22,2013-07-25 00:00:00,333,COMPLETE
26,2013-07-25 00:00:00,7562,COMPLETE
28,2013-07-25 00:00:00,656,COMPLETE
32,2013-07-25 00:00:00,3960,COMPLETE


In [27]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
68883


In [28]:
%%sql 

SELECT count(1) 
FROM orders
WHERE order_status = 'COMPLETE'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
22899


In [29]:
%%sql 

SELECT DISTINCT order_status
FROM orders
WHERE order_status = 'COMPLETE'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


order_status
COMPLETE


In [30]:
%%sql

SELECT DISTINCT order_status
FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
9 rows affected.


order_status
COMPLETE
ON_HOLD
PENDING_PAYMENT
PENDING
CLOSED
CANCELED
PROCESSING
PAYMENT_REVIEW
SUSPECTED_FRAUD


In [31]:
%%sql 

SELECT * FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED') 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
12,2013-07-25 00:00:00,1837,CLOSED
15,2013-07-25 00:00:00,2568,COMPLETE
17,2013-07-25 00:00:00,2667,COMPLETE
18,2013-07-25 00:00:00,1205,CLOSED


In [32]:
%%sql

SELECT count(1) FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
30455


In [33]:
%%sql 

SELECT count(1) FROM orders 
WHERE order_status = 'COMPLETE' OR order_status = 'CLOSED'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
30455


In [34]:
%%sql

SELECT * FROM orders
WHERE order_date = '2014-01-01'
LIMIT 3

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.


order_id,order_date,order_customer_id,order_status
25876,2014-01-01 00:00:00,3414,PENDING_PAYMENT
25877,2014-01-01 00:00:00,5549,PENDING_PAYMENT
25878,2014-01-01 00:00:00,9084,PENDING


```{note}
This query will not work as LIKE cannot be used to compare against columns with date data type
```

In [35]:
%%sql

SELECT * FROM orders
WHERE order_date LIKE '2014-01%'
LIMIT 3

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
(psycopg2.errors.UndefinedFunction) operator does not exist: timestamp without time zone ~~ unknown
LINE 2: WHERE order_date LIKE '2014-01%'
                         ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT * FROM orders
WHERE order_date LIKE '2014-01%%'
LIMIT 3]
(Background on this error at: http://sqlalche.me/e/13/f405)


In [36]:
%%sql

SELECT * FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM-dd') LIKE '2014-01%'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
25882,2014-01-01 00:00:00,4598,COMPLETE
25888,2014-01-01 00:00:00,6735,COMPLETE
25889,2014-01-01 00:00:00,10045,COMPLETE
25891,2014-01-01 00:00:00,3037,CLOSED
25895,2014-01-01 00:00:00,1044,COMPLETE
25897,2014-01-01 00:00:00,6405,COMPLETE
25898,2014-01-01 00:00:00,3950,COMPLETE
25899,2014-01-01 00:00:00,8068,CLOSED
25900,2014-01-01 00:00:00,2382,CLOSED
25901,2014-01-01 00:00:00,3099,COMPLETE


In [37]:
%%sql

SELECT count(1) FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM-dd') LIKE '2014-01%'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
2544


In [38]:
%%sql

SELECT * FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
25882,2014-01-01 00:00:00,4598,COMPLETE
25888,2014-01-01 00:00:00,6735,COMPLETE
25889,2014-01-01 00:00:00,10045,COMPLETE
25891,2014-01-01 00:00:00,3037,CLOSED
25895,2014-01-01 00:00:00,1044,COMPLETE
25897,2014-01-01 00:00:00,6405,COMPLETE
25898,2014-01-01 00:00:00,3950,COMPLETE
25899,2014-01-01 00:00:00,8068,CLOSED
25900,2014-01-01 00:00:00,2382,CLOSED
25901,2014-01-01 00:00:00,3099,COMPLETE


In [39]:
%%sql

SELECT count(1) FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
2544


In [40]:
%%sql

SELECT count(1) FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM-dd') ~ '2014-01'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
2544


In [41]:
%%sql

SELECT count(1), min(order_date), max(order_date), count(DISTINCT order_date) 
FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND order_date BETWEEN '2014-01-01' AND '2014-03-31'

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count,min,max,count_1
7594,2014-01-01 00:00:00,2014-03-31 00:00:00,89


In [42]:
%%sql

SELECT DISTINCT order_date
FROM orders
WHERE to_char(order_date, 'yyyy-MM') LIKE '2014-03%'
ORDER BY order_date

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
30 rows affected.


order_date
2014-03-01 00:00:00
2014-03-02 00:00:00
2014-03-03 00:00:00
2014-03-04 00:00:00
2014-03-05 00:00:00
2014-03-06 00:00:00
2014-03-07 00:00:00
2014-03-08 00:00:00
2014-03-10 00:00:00
2014-03-11 00:00:00


In [43]:
%%sql

DROP TABLE IF EXISTS users

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.


[]

In [44]:
%%sql

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    create_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.


[]

In [45]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id)
VALUES ('Donald', 'Duck', 'donald@duck.com')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


[]

In [46]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id, user_role, is_active)
VALUES ('Mickey', 'Mouse', 'mickey@mouse.com', 'U', true)

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


[]

In [47]:
%%sql

INSERT INTO users 
    (user_first_name, user_last_name, user_email_id, user_password, user_role, is_active) 
VALUES 
    ('Gordan', 'Bradock', 'gbradock0@barnesandnoble.com', 'h9LAz7p7ub', 'U', true),
    ('Tobe', 'Lyness', 'tlyness1@paginegialle.it', 'oEofndp', 'U', true),
    ('Addie', 'Mesias', 'amesias2@twitpic.com', 'ih7Y69u56', 'U', true)

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.


[]

In [48]:
%%sql

SELECT * FROM users

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
5 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,create_ts,last_updated_ts
1,Donald,Duck,donald@duck.com,False,,U,False,2020-11-14 15:38:53.352984,2020-11-14 15:38:53.352984
2,Mickey,Mouse,mickey@mouse.com,False,,U,True,2020-11-14 15:38:54.369402,2020-11-14 15:38:54.369402
3,Gordan,Bradock,gbradock0@barnesandnoble.com,False,h9LAz7p7ub,U,True,2020-11-14 15:38:55.260250,2020-11-14 15:38:55.260250
4,Tobe,Lyness,tlyness1@paginegialle.it,False,oEofndp,U,True,2020-11-14 15:38:55.260250,2020-11-14 15:38:55.260250
5,Addie,Mesias,amesias2@twitpic.com,False,ih7Y69u56,U,True,2020-11-14 15:38:55.260250,2020-11-14 15:38:55.260250


```{note}
This will not return any thing and not the correct way to compare against NULL.
NULL is specially treated by databases and it is not same as empty string.
```

In [49]:
%%sql

SELECT * FROM users
WHERE user_password = NULL

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
0 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,create_ts,last_updated_ts


In [50]:
%%sql

SELECT * FROM users
WHERE user_password IS NULL

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
2 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,create_ts,last_updated_ts
1,Donald,Duck,donald@duck.com,False,,U,False,2020-11-14 15:38:53.352984,2020-11-14 15:38:53.352984
2,Mickey,Mouse,mickey@mouse.com,False,,U,True,2020-11-14 15:38:54.369402,2020-11-14 15:38:54.369402


In [51]:
%%sql

SELECT * FROM users
WHERE user_password IS NOT NULL

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,is_active,create_ts,last_updated_ts
3,Gordan,Bradock,gbradock0@barnesandnoble.com,False,h9LAz7p7ub,U,True,2020-11-14 15:38:55.260250,2020-11-14 15:38:55.260250
4,Tobe,Lyness,tlyness1@paginegialle.it,False,oEofndp,U,True,2020-11-14 15:38:55.260250,2020-11-14 15:38:55.260250
5,Addie,Mesias,amesias2@twitpic.com,False,ih7Y69u56,U,True,2020-11-14 15:38:55.260250,2020-11-14 15:38:55.260250


## Joining Tables – Inner

Let us understand how to join data from multiple tables.

In [8]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/7Wg4zpfj02s?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We will primarily focus on ASCII style join (**JOIN with ON**).
* There are different types of joins.
  * INNER JOIN - Get all the records from both the datasets which satisfies JOIN condition.
  * OUTER JOIN - We will get into the details as part of the next topic
* Example for INNER JOIN

```sql
SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10
```

* We can join more than 2 tables in one query. Here is how it will look like.

```sql
SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
    JOIN products p
    ON p.product_id = oi.order_item_product_id
LIMIT 10
```

* If we have to apply additional filters, it is recommended to use WHERE clause. ON clause should only have join conditions.
* We can have non equal join conditions as well, but they are not used that often.
* Here are some of the examples for INNER JOIN:
  * Get order id, date, status and item revenue for all order items.
  * Get order id, date, status and item revenue for all order items for all orders where order status is either COMPLETE or CLOSED.
  * Get order id, date, status and item revenue for all order items for all orders where order status is either COMPLETE or CLOSED for the orders that are placed in the month of 2014 January.

In [52]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [53]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [54]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_status,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,299.98
2,2013-07-25 00:00:00,PENDING_PAYMENT,199.99
2,2013-07-25 00:00:00,PENDING_PAYMENT,250.0
2,2013-07-25 00:00:00,PENDING_PAYMENT,129.99
4,2013-07-25 00:00:00,CLOSED,49.98
4,2013-07-25 00:00:00,CLOSED,299.95
4,2013-07-25 00:00:00,CLOSED,150.0
4,2013-07-25 00:00:00,CLOSED,199.92
5,2013-07-25 00:00:00,COMPLETE,299.98
5,2013-07-25 00:00:00,COMPLETE,299.95


In [55]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
68883


In [56]:
%sql SELECT count(1) FROM order_items

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
172198


In [57]:
%%sql

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
172198


In [58]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_status,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,299.98
4,2013-07-25 00:00:00,CLOSED,49.98
4,2013-07-25 00:00:00,CLOSED,299.95
4,2013-07-25 00:00:00,CLOSED,150.0
4,2013-07-25 00:00:00,CLOSED,199.92
5,2013-07-25 00:00:00,COMPLETE,299.98
5,2013-07-25 00:00:00,COMPLETE,299.95
5,2013-07-25 00:00:00,COMPLETE,99.96
5,2013-07-25 00:00:00,COMPLETE,299.98
5,2013-07-25 00:00:00,COMPLETE,129.99


In [59]:
%%sql

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
75408


In [60]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_status,order_item_subtotal
25882,2014-01-01 00:00:00,COMPLETE,299.97
25882,2014-01-01 00:00:00,COMPLETE,100.0
25882,2014-01-01 00:00:00,COMPLETE,79.98
25882,2014-01-01 00:00:00,COMPLETE,399.98
25888,2014-01-01 00:00:00,COMPLETE,299.98
25889,2014-01-01 00:00:00,COMPLETE,99.96
25889,2014-01-01 00:00:00,COMPLETE,19.99
25891,2014-01-01 00:00:00,CLOSED,150.0
25891,2014-01-01 00:00:00,CLOSED,50.0
25891,2014-01-01 00:00:00,CLOSED,119.97


In [61]:
%%sql

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
6198


## Joining Tables - Outer

Let us understand how to perform outer joins using SQL. There are 3 different types of outer joins.

In [9]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/U050biNag4w?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* `LEFT OUTER JOIN` (default) - Get all the records from both the datasets which satisfies JOIN condition along with those records which are in the left side table but not in the right side table.
* `RIGHT OUTER JOIN` - Get all the records from both the datasets which satisfies JOIN condition along with those records which are in the right side table but not in the left side table.
* `FULL OUTER JOIN` - left union right
* When we perform the outer join (lets say left outer join), we will see this.
  * Get all the values from both the tables when join condition satisfies.
  * If there are rows on left side table for which there are no corresponding values in right side table, all the projected column values for right side table will be null.
* Here are some of the examples for outer join.
    * Get all the orders where there are no corresponding order items.
    * Get all the order items where there are no corresponding orders.

In [62]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [63]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [64]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
ORDER BY o.order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_status,order_item_order_id,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,1.0,299.98
2,2013-07-25 00:00:00,PENDING_PAYMENT,2.0,129.99
2,2013-07-25 00:00:00,PENDING_PAYMENT,2.0,250.0
2,2013-07-25 00:00:00,PENDING_PAYMENT,2.0,199.99
3,2013-07-25 00:00:00,COMPLETE,,
4,2013-07-25 00:00:00,CLOSED,4.0,199.92
4,2013-07-25 00:00:00,CLOSED,4.0,150.0
4,2013-07-25 00:00:00,CLOSED,4.0,299.95
4,2013-07-25 00:00:00,CLOSED,4.0,49.98
5,2013-07-25 00:00:00,COMPLETE,5.0,299.98


In [65]:
%%sql

SELECT count(1)
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
183650


In [66]:
%%sql

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
172198


In [67]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE oi.order_item_order_id IS NULL
ORDER BY o.order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_status,order_item_order_id,order_item_subtotal
3,2013-07-25 00:00:00,COMPLETE,,
6,2013-07-25 00:00:00,COMPLETE,,
22,2013-07-25 00:00:00,COMPLETE,,
26,2013-07-25 00:00:00,COMPLETE,,
32,2013-07-25 00:00:00,COMPLETE,,
40,2013-07-25 00:00:00,PENDING_PAYMENT,,
47,2013-07-25 00:00:00,PENDING_PAYMENT,,
53,2013-07-25 00:00:00,PROCESSING,,
54,2013-07-25 00:00:00,PENDING_PAYMENT,,
55,2013-07-25 00:00:00,PENDING,,


In [68]:
%%sql

SELECT count(1)
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE oi.order_item_order_id IS NULL

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
11452


In [69]:
%%sql

SELECT count(1)
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE oi.order_item_order_id IS NULL
    AND o.order_status IN ('COMPLETE', 'CLOSED')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
5189


In [70]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o RIGHT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_status,order_item_order_id,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,1,299.98
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,199.99
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,250.0
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,129.99
4,2013-07-25 00:00:00,CLOSED,4,49.98
4,2013-07-25 00:00:00,CLOSED,4,299.95
4,2013-07-25 00:00:00,CLOSED,4,150.0
4,2013-07-25 00:00:00,CLOSED,4,199.92
5,2013-07-25 00:00:00,COMPLETE,5,299.98
5,2013-07-25 00:00:00,COMPLETE,5,299.95


In [71]:
%%sql

SELECT count(1)
FROM orders o RIGHT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
172198


In [72]:
%%sql

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o RIGHT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_id IS NULL
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
0 rows affected.


order_id,order_date,order_status,order_item_order_id,order_item_subtotal


## Performing Aggregations

Let us understand how to aggregate the data.

In [10]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/mIPF0ENiKiE?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We can perform global aggregations as well as aggregations by key.
* Global Aggregations
  * Get total number of orders.
  * Get revenue for a given order id.
  * Get number of records with order_status either COMPLETED or CLOSED.
* Aggregations by key - using `GROUP BY`
  * Get number of orders by date or status.
  * Get revenue for each order_id.
  * Get daily product revenue (using order date and product id as keys).
* We can also use `HAVING` clause to apply filtering on top of aggregated data.
  * Get daily product revenue where revenue is greater than $500 (using order date and product id as keys).
* Rules while using `GROUP BY`.
  * We can have the columns which are specified as part of `GROUP BY` in `SELECT` clause.
  * On top of those, we can have derived columns using aggregate functions.
  * We cannot have any other columns that are not used as part of `GROUP BY` or derived column using non aggregate functions.
  * We will not be able to use aggregate functions or aliases used in the select clause as part of the where clause.
  * If we want to filter based on aggregated results, then we can leverage `HAVING` on top of `GROUP BY` (specifying `WHERE` is not an option)
* Typical query execution - FROM -> WHERE -> GROUP BY -> SELECT

In [73]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [74]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [75]:
%sql SELECT count(order_id) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
68883


In [76]:
%sql SELECT count(DISTINCT order_date) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
364


In [77]:
%%sql

SELECT *
FROM order_items 
WHERE order_item_order_id = 2

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
2,2,1073,1,199.99,199.99
3,2,502,5,250.0,50.0
4,2,403,1,129.99,129.99


In [78]:
%%sql

SELECT round(sum(order_item_subtotal::numeric), 2) AS order_revenue
FROM order_items 
WHERE order_item_order_id = 2

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


order_revenue
579.98


In [79]:
%%sql

SELECT count(1) 
FROM orders
WHERE order_status IN ('COMPLETE', 'CLOSED')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
30455


In [80]:
%%sql

SELECT order_date,
    count(1)
FROM orders
GROUP BY order_date
ORDER BY order_date
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_date,count
2013-07-25 00:00:00,143
2013-07-26 00:00:00,269
2013-07-27 00:00:00,202
2013-07-28 00:00:00,187
2013-07-29 00:00:00,253
2013-07-30 00:00:00,227
2013-07-31 00:00:00,252
2013-08-01 00:00:00,246
2013-08-02 00:00:00,224
2013-08-03 00:00:00,183


In [81]:
%%sql

SELECT order_status,
    count(1) AS status_count
FROM orders
GROUP BY order_status
ORDER BY order_status
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
9 rows affected.


order_status,status_count
CANCELED,1428
CLOSED,7556
COMPLETE,22899
ON_HOLD,3798
PAYMENT_REVIEW,729
PENDING,7610
PENDING_PAYMENT,15030
PROCESSING,8275
SUSPECTED_FRAUD,1558


In [82]:
%%sql

SELECT order_item_order_id,
    sum(order_item_subtotal) AS order_revenue
FROM order_items
GROUP BY order_item_order_id 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_item_order_id,order_revenue
44127,179.97
26264,334.96000000000004
37876,699.97
55864,600.94
31789,129.99
56903,479.97
40694,1129.75
48663,969.92
47216,1219.89
37922,1029.9


```{error}
This query using `round` will fail as `sum(order_item_subtotal)` will not return the data accepted by `round`. We have to convert the data type of `sum(order_item_subtotal)` to `numeric`.
```

In [83]:
%%sql

SELECT order_item_order_id,
    round(sum(order_item_subtotal), 2) AS order_revenue
FROM order_items
GROUP BY order_item_order_id 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
(psycopg2.errors.UndefinedFunction) function round(double precision, integer) does not exist
LINE 1: SELECT order_item_order_id, round(sum(order_item_subtotal), ...
                                    ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT order_item_order_id, round(sum(order_item_subtotal), 2) AS order_revenue
FROM order_items
GROUP BY order_item_order_id 
LIMIT 10]
(Background on this error at: http://sqlalche.me/e/13/f405)


In [84]:
%%sql

SELECT order_item_order_id,
    round(sum(order_item_subtotal)::numeric, 2) AS order_revenue
FROM order_items
GROUP BY order_item_order_id 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_item_order_id,order_revenue
44127,179.97
26264,334.96
37876,699.97
55864,600.94
31789,129.99
56903,479.97
40694,1129.75
48663,969.92
47216,1219.89
37922,1029.9


In [85]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date,
    oi.order_item_product_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_date,order_item_product_id,revenue
2013-07-25 00:00:00,24,319.96
2013-07-25 00:00:00,93,74.97
2013-07-25 00:00:00,134,100.0
2013-07-25 00:00:00,191,5099.49
2013-07-25 00:00:00,226,599.99
2013-07-25 00:00:00,365,3359.44
2013-07-25 00:00:00,403,1949.85
2013-07-25 00:00:00,502,1650.0
2013-07-25 00:00:00,572,119.97
2013-07-25 00:00:00,625,199.99


```{note}
We cannot use the aliases in select clause in `WHERE`. In this case **revenue** cannot be used in `WHERE` clause.
```

In [86]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND revenue >= 500
GROUP BY o.order_date,
    oi.order_item_product_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
(psycopg2.errors.UndefinedColumn) column "revenue" does not exist
LINE 5:     AND revenue >= 500
                ^

[SQL: SELECT o.order_date, oi.order_item_product_id, round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND revenue >= 500
GROUP BY o.order_date,
    oi.order_item_product_id
LIMIT 10]
(Background on this error at: http://sqlalche.me/e/13/f405)


```{note}
We cannot use aggregate functions in `WHERE` clause.
```

In [87]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND round(sum(oi.order_item_subtotal::numeric), 2) >= 500
GROUP BY o.order_date,
    oi.order_item_product_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
(psycopg2.errors.GroupingError) aggregate functions are not allowed in WHERE
LINE 5:     AND round(sum(oi.order_item_subtotal::numeric), 2) >= 50...
                      ^

[SQL: SELECT o.order_date, oi.order_item_product_id, round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND round(sum(oi.order_item_subtotal::numeric), 2) >= 500
GROUP BY o.order_date,
    oi.order_item_product_id
LIMIT 10]
(Background on this error at: http://sqlalche.me/e/13/f405)


In [88]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date, 
    oi.order_item_product_id
HAVING round(sum(oi.order_item_subtotal::numeric), 2) >= 500
ORDER BY o.order_date, revenue DESC
LIMIT 25

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
25 rows affected.


order_date,order_item_product_id,revenue
2013-07-25 00:00:00,1004,5599.72
2013-07-25 00:00:00,191,5099.49
2013-07-25 00:00:00,957,4499.7
2013-07-25 00:00:00,365,3359.44
2013-07-25 00:00:00,1073,2999.85
2013-07-25 00:00:00,1014,2798.88
2013-07-25 00:00:00,403,1949.85
2013-07-25 00:00:00,502,1650.0
2013-07-25 00:00:00,627,1079.73
2013-07-25 00:00:00,226,599.99


In [89]:
%%sql

SELECT count(1) FROM (
    SELECT o.order_date,
        oi.order_item_product_id,
        round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
    FROM orders o JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    GROUP BY o.order_date, 
        oi.order_item_product_id
) q

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
9120


In [90]:
%%sql

SELECT count(1) FROM (
    SELECT o.order_date,
        oi.order_item_product_id,
        round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
    FROM orders o JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    GROUP BY o.order_date, 
        oi.order_item_product_id
    HAVING round(sum(oi.order_item_subtotal::numeric), 2) >= 500
) q

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
3339


## Sorting Data

Let us understand how to sort the data using **SQL**.

In [11]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/kI-y9WNhiv8?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We typically perform sorting as final step.
* Sorting can be done either by using one field or multiple fields. Sorting by multiple fields is also known as composite sorting.
* We can sort the data either in ascending order or descending order by using column or expression.
* By default, the sorting order is ascending and we can change it to descending by using `DESC`.
* As part of composite sorting, we can sort the data in ascending order on some fields and descending order on other fields.
* Typical query execution order
  1. `FROM`
  2. `WHERE`
  3. `GROUP BY` and `HAVING`
  4. `SELECT`
  5. `ORDER BY`

```sql
SELECT order_date, count(1) AS order_count
FROM orders
WHERE order_status IN ('COMPLETE', 'CLOSED')
GROUP BY order_date
HAVING count(1) > 50
ORDER BY order_count DESC
```

In [91]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [92]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [93]:
%%sql

SELECT * FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
8,2013-07-25 00:00:00,2911,PROCESSING
9,2013-07-25 00:00:00,5657,PENDING_PAYMENT
10,2013-07-25 00:00:00,5648,PENDING_PAYMENT


In [94]:
%%sql

SELECT * FROM orders
ORDER BY order_customer_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
22945,2013-12-13 00:00:00,1,COMPLETE
33865,2014-02-18 00:00:00,2,COMPLETE
67863,2013-11-30 00:00:00,2,COMPLETE
15192,2013-10-29 00:00:00,2,PENDING_PAYMENT
57963,2013-08-02 00:00:00,2,ON_HOLD
56178,2014-07-15 00:00:00,3,PENDING
57617,2014-07-24 00:00:00,3,COMPLETE
23662,2013-12-19 00:00:00,3,COMPLETE
22646,2013-12-11 00:00:00,3,COMPLETE
35158,2014-02-26 00:00:00,3,COMPLETE


In [95]:
%%sql

SELECT * FROM orders
ORDER BY order_customer_id ASC
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
22945,2013-12-13 00:00:00,1,COMPLETE
33865,2014-02-18 00:00:00,2,COMPLETE
67863,2013-11-30 00:00:00,2,COMPLETE
15192,2013-10-29 00:00:00,2,PENDING_PAYMENT
57963,2013-08-02 00:00:00,2,ON_HOLD
56178,2014-07-15 00:00:00,3,PENDING
57617,2014-07-24 00:00:00,3,COMPLETE
23662,2013-12-19 00:00:00,3,COMPLETE
22646,2013-12-11 00:00:00,3,COMPLETE
35158,2014-02-26 00:00:00,3,COMPLETE


In [96]:
%%sql

SELECT * FROM orders
ORDER BY order_customer_id,
    order_date
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
22945,2013-12-13 00:00:00,1,COMPLETE
57963,2013-08-02 00:00:00,2,ON_HOLD
15192,2013-10-29 00:00:00,2,PENDING_PAYMENT
67863,2013-11-30 00:00:00,2,COMPLETE
33865,2014-02-18 00:00:00,2,COMPLETE
22646,2013-12-11 00:00:00,3,COMPLETE
61453,2013-12-14 00:00:00,3,COMPLETE
23662,2013-12-19 00:00:00,3,COMPLETE
35158,2014-02-26 00:00:00,3,COMPLETE
46399,2014-05-09 00:00:00,3,PROCESSING


In [97]:
%%sql

SELECT * FROM orders
ORDER BY order_customer_id,
    order_date DESC
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
22945,2013-12-13 00:00:00,1,COMPLETE
33865,2014-02-18 00:00:00,2,COMPLETE
67863,2013-11-30 00:00:00,2,COMPLETE
15192,2013-10-29 00:00:00,2,PENDING_PAYMENT
57963,2013-08-02 00:00:00,2,ON_HOLD
57617,2014-07-24 00:00:00,3,COMPLETE
56178,2014-07-15 00:00:00,3,PENDING
46399,2014-05-09 00:00:00,3,PROCESSING
35158,2014-02-26 00:00:00,3,COMPLETE
23662,2013-12-19 00:00:00,3,COMPLETE


In [98]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date,
    oi.order_item_product_id
ORDER BY o.order_date,
    revenue DESC
LIMIT 25

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
25 rows affected.


order_date,order_item_product_id,revenue
2013-07-25 00:00:00,1004,5599.72
2013-07-25 00:00:00,191,5099.49
2013-07-25 00:00:00,957,4499.7
2013-07-25 00:00:00,365,3359.44
2013-07-25 00:00:00,1073,2999.85
2013-07-25 00:00:00,1014,2798.88
2013-07-25 00:00:00,403,1949.85
2013-07-25 00:00:00,502,1650.0
2013-07-25 00:00:00,627,1079.73
2013-07-25 00:00:00,226,599.99


In [99]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    round(sum(oi.order_item_subtotal::numeric), 2) AS revenue
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date,
    oi.order_item_product_id
HAVING round(sum(oi.order_item_subtotal::numeric), 2) >= 1000
ORDER BY o.order_date,
    revenue DESC
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_date,order_item_product_id,revenue
2013-07-25 00:00:00,1004,5599.72
2013-07-25 00:00:00,191,5099.49
2013-07-25 00:00:00,957,4499.7
2013-07-25 00:00:00,365,3359.44
2013-07-25 00:00:00,1073,2999.85
2013-07-25 00:00:00,1014,2798.88
2013-07-25 00:00:00,403,1949.85
2013-07-25 00:00:00,502,1650.0
2013-07-25 00:00:00,627,1079.73
2013-07-26 00:00:00,1004,10799.46


In [100]:
%%sql

DROP TABLE IF EXISTS users

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.


[]

In [101]:
%%sql

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    user_country VARCHAR(2),
    is_active BOOLEAN DEFAULT FALSE,
    create_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
Done.


[]

In [102]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id, user_country)
VALUES ('Donald', 'Duck', 'donald@duck.com', 'IN')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


[]

In [103]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id, user_role, is_active, user_country)
VALUES ('Mickey', 'Mouse', 'mickey@mouse.com', 'U', true, 'US')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


[]

In [104]:
%%sql

INSERT INTO users 
    (user_first_name, user_last_name, user_email_id, user_password, user_role, is_active, user_country) 
VALUES 
    ('Gordan', 'Bradock', 'gbradock0@barnesandnoble.com', 'h9LAz7p7ub', 'U', true, 'CA'),
    ('Tobe', 'Lyness', 'tlyness1@paginegialle.it', 'oEofndp', 'U', true, 'FR'),
    ('Addie', 'Mesias', 'amesias2@twitpic.com', 'ih7Y69u56', 'U', true, 'AU')

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
3 rows affected.


[]

In [105]:
%%sql

SELECT * FROM users
ORDER BY user_country

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
5 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_email_validated,user_password,user_role,user_country,is_active,create_ts,last_updated_ts
5,Addie,Mesias,amesias2@twitpic.com,False,ih7Y69u56,U,AU,True,2020-11-14 15:40:12.414932,2020-11-14 15:40:12.414932
3,Gordan,Bradock,gbradock0@barnesandnoble.com,False,h9LAz7p7ub,U,CA,True,2020-11-14 15:40:12.414932,2020-11-14 15:40:12.414932
4,Tobe,Lyness,tlyness1@paginegialle.it,False,oEofndp,U,FR,True,2020-11-14 15:40:12.414932,2020-11-14 15:40:12.414932
1,Donald,Duck,donald@duck.com,False,,U,IN,False,2020-11-14 15:40:10.878908,2020-11-14 15:40:10.878908
2,Mickey,Mouse,mickey@mouse.com,False,,U,US,True,2020-11-14 15:40:11.683887,2020-11-14 15:40:11.683887


In [106]:
%%sql

SELECT user_id,
    user_first_name,
    user_last_name,
    user_email_id,
    user_country
FROM users
ORDER BY 
    CASE WHEN user_country = 'US' THEN 0
        ELSE 1
    END, user_country

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
5 rows affected.


user_id,user_first_name,user_last_name,user_email_id,user_country
2,Mickey,Mouse,mickey@mouse.com,US
5,Addie,Mesias,amesias2@twitpic.com,AU
3,Gordan,Bradock,gbradock0@barnesandnoble.com,CA
4,Tobe,Lyness,tlyness1@paginegialle.it,FR
1,Donald,Duck,donald@duck.com,IN


## Solution – Daily Product Revenue

Let us review the Final Solution for our problem statement **daily_product_revenue**.

In [12]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Gx56dPQX4C4?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Prepare tables
  * Create tables
  * Load the data into tables
* We need to project the fields which we are interested in. We need to have **product_id** as well as **product_name** as there can be products with same name and can result in incorrect output.
  * order_date
  * order_item_product_id
  * product_name
  * product_revenue
* As we have fields from multiple tables, we need to perform join after which we have to filter for COMPLETE or CLOSED orders.
* We have to group the data by order_date and order_item_product_id, then we have to perform aggregation on order_item_subtotal to get product_revenue.

In [107]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [108]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5432/itversity_retail_db


In [109]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    p.product_name,
    round(sum(oi.order_item_subtotal::numeric), 2) AS product_revenue
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    JOIN products p
        ON p.product_id = oi.order_item_product_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date,
    oi.order_item_product_id,
    p.product_name
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_date,order_item_product_id,product_name,product_revenue
2013-07-25 00:00:00,24,Elevation Training Mask 2.0,319.96
2013-07-25 00:00:00,93,Under Armour Men's Tech II T-Shirt,74.97
2013-07-25 00:00:00,134,Nike Women's Legend V-Neck T-Shirt,100.0
2013-07-25 00:00:00,191,Nike Men's Free 5.0+ Running Shoe,5099.49
2013-07-25 00:00:00,226,Bowflex SelectTech 1090 Dumbbells,599.99
2013-07-25 00:00:00,365,Perfect Fitness Perfect Rip Deck,3359.44
2013-07-25 00:00:00,403,Nike Men's CJ Elite 2 TD Football Cleat,1949.85
2013-07-25 00:00:00,502,Nike Men's Dri-FIT Victory Golf Polo,1650.0
2013-07-25 00:00:00,572,TYR Boys' Team Digi Jammer,119.97
2013-07-25 00:00:00,625,Nike Men's Kobe IX Elite Low Basketball Shoe,199.99


In [110]:
%%sql

SELECT o.order_date,
    oi.order_item_product_id,
    p.product_name,
    round(sum(oi.order_item_subtotal::numeric), 2) AS product_revenue
FROM orders o 
    JOIN order_items oi
        ON o.order_id = oi.order_item_order_id
    JOIN products p
        ON p.product_id = oi.order_item_product_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY o.order_date,
    oi.order_item_product_id,
    p.product_name
ORDER BY o.order_date,
    product_revenue DESC
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
10 rows affected.


order_date,order_item_product_id,product_name,product_revenue
2013-07-25 00:00:00,1004,Field & Stream Sportsman 16 Gun Fire Safe,5599.72
2013-07-25 00:00:00,191,Nike Men's Free 5.0+ Running Shoe,5099.49
2013-07-25 00:00:00,957,Diamondback Women's Serene Classic Comfort Bi,4499.7
2013-07-25 00:00:00,365,Perfect Fitness Perfect Rip Deck,3359.44
2013-07-25 00:00:00,1073,Pelican Sunstream 100 Kayak,2999.85
2013-07-25 00:00:00,1014,O'Brien Men's Neoprene Life Vest,2798.88
2013-07-25 00:00:00,403,Nike Men's CJ Elite 2 TD Football Cleat,1949.85
2013-07-25 00:00:00,502,Nike Men's Dri-FIT Victory Golf Polo,1650.0
2013-07-25 00:00:00,627,Under Armour Girls' Toddler Spine Surge Runni,1079.73
2013-07-25 00:00:00,226,Bowflex SelectTech 1090 Dumbbells,599.99


In [111]:
%%sql

SELECT count(1) FROM (
    SELECT o.order_date,
        oi.order_item_product_id,
        p.product_name,
        round(sum(oi.order_item_subtotal::numeric), 2) AS product_revenue
    FROM orders o 
        JOIN order_items oi
            ON o.order_id = oi.order_item_order_id
        JOIN products p
            ON p.product_id = oi.order_item_product_id
    WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    GROUP BY o.order_date,
        oi.order_item_product_id,
        p.product_name
) q

 * postgresql://itversity_retail_user:***@localhost:5432/itversity_retail_db
1 rows affected.


count
9120


## Exercises - Basic SQL Queries

Here are some of the exercises for which you can write SQL queries to self evaluate.

In [13]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/auRIHsKXV6o?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Ensure that we have required database and user for retail data. **We might provide the database as part of our labs.** Here are the instructions to use `psql` for setting up the required tables.

```shell
psql -U postgres -h localhost -p 5432 -W
```

```sql
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
```

* Create Tables using the script provided. You can either use `psql` or **SQL Workbench**.

```shell
psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W
```

* You can drop the existing tables.

```sql
DROP TABLE order_items;
DROP TABLE orders;
DROP TABLE customers;
DROP TABLE products;
DROP TABLE categories;
DROP TABLE departments;
```

* Once the tables are dropped you can run below script to create the tables for the purpose of exercises.

```sql
\i /data/retail_db/create_db_tables_pg.sql
```

* Data shall be loaded using the script provided.

```sql
\i /data/retail_db/load_db_tables_pg.sql
```

* Run queries to validate we have data in all the 3 tables.

### Exercise 1 - Customer order count

Get order count per customer for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in descending order by count and ascending order by customer id.
* Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.

### Exercise 2 - Dormant Customers

Get the customer details who have not placed any order for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in ascending order by customer_id
* Output should contain all the fields from customers

### Exercise 3 - Revenue Per Customer

Get the revenue generated by each customer for the month of 2014 January
* Tables - orders, order_items and customers
* Data should be sorted in descending order by revenue and then ascending order by customer_id
* Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.
* If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.
* Consider only COMPLETE and CLOSED orders

### Exercise 4 - Revenue Per Category

Get the revenue generated for each category for the month of 2014 January
* Tables - orders, order_items, products and categories
* Data should be sorted in ascending order by category_id.
* Output should contain all the fields from category along with the revenue as category_revenue.
* Consider only COMPLETE and CLOSED orders

### Exercise 5 - Product Count Per Department

Get the products for each department.
* Tables - departments, categories, products
* Data should be sorted in ascending order by department_id
* Output should contain all the fields from department and the product count as product_count