# Writing Basic SQL Queries

As part of this section we will primarily focus on writing basic queries.

* Standard Transformations
* Overview of Data Model
* Define Problem Statement – Daily Product Revenue
* Preparing Tables
* Selecting or Projecting Data
* Filtering Data
* Joining Tables – Inner
* Joining Tables – Outer
* Performing Aggregations
* Sorting Data
* Solution – Daily Product Revenue

## Standard Transformations

Here are some of the transformations we typically perform on regular basis.
* Projection of data
* Filtering data
* Performing Aggregations
* Joins
* Sorting
* Ranking (will be covered as part of advanced queries)

## Overview of Data Model

We will be using retail data model for this section. It contains 6 tables.
* Table list
  * orders
  * order_items
  * products
  * categories
  * departments
  * customers
* **orders** and **order_items** are transactional tables.
* **products**, **categories** and **departments** are non transactional tables which have data related to product catalog.
* **customers** is a non transactional table which have customer details.
* There is 1 to many relationship between **orders** and **order_items**.
* There is 1 to many relationship between **products** and **order_items**. Each order item will have one product and product can be part of many order_items.
* There is 1 to many relationship between **customers** and **orders**. A customer can place many orders over a period of time but there cannot be more than one customer for a given order.
* There is 1 to many relationship between **departments** and **categories**. Also there is 1 to many relationship between **categories** and **products**.
* There is hierarchical relationship from departments to products - **departments** -> **categories** -> **products**

## Define Problem Statement – Daily Product Revenue

Let us try to get daily product revenue using retail tables.
* daily is derived from orders.order_date.
* product has to be derived from products.product_name.
* revenue has to be derived from order_items.order_item_subtotal.
* We need to join all the 3 tables, then group by order_date, product_id as well as product_name to get revenue using order_item_subtotal.
* Get Daily Product Revenue using products, orders and order_items data set.
* We have following fields in **orders**.
  * order_id
  * order_date
  * order_customer_id
  * order_status
* We have following fields in **order_items**.
  * order_item_id
  * order_item_order_id
  * order_item_product_id
  * order_item_quantity
  * order_item_subtotal
  * order_item_product_price
* We have following fields in **products**
  * product_id
  * product_category_id
  * product_name
  * product_description
  * product_price
  * product_image
* We have one to many relationship between orders and order_items.
* **orders.order_id** is **primary key** and **order_items.order_item_order_id** is foreign key to **orders.order_id**.
* We have one to many relationship between products and order_items.
* **products.product_id** is **primary key** and **order_items.order_item_product_id** is foreign key to **oproducts.product_id**
* By the end of this module we will explore all standard transformation and get daily product revenue using following fields.
  * **orders.order_date**
  * **order_items.order_item_product_id**
  * **products.product_name**
  * **order_items.order_item_subtotal** (aggregated using date and product_id).
* We will consider only **COMPLETE** or **CLOSED** orders.
* As there can be more than one product names with different ids, we have to include product_id as part of the key using which we will group the data.

## Preparing Tables

Let us ensure we have all the tables are ready to come up with the solution for the problem statement.
* Ensure that we have required database and user for retail data. We might provide the database as part of our labs.

```
psql -U postgres -h localhost -p 5433 -W

CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
```

* Create Tables using the script provided. You can either use `psql` or **SQL Alchemy**.

```
psql -U itversity_retail_user \
  -h localhost \
  -p 5433 \
  -d itversity_retail_db \
  -W

\i retail_db/create_db_tables_pg.sql
```

* Data shall be loaded using the script provided.

```
\i retail_db/load_db_tables_pg.sql
```

* Run queries to validate we have data in all the 3 tables.

In [1]:
%load_ext sql

In [2]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db


In [3]:
%sql SELECT current_database()

1 rows affected.


current_database
itversity_retail_db


In [4]:
%%sql result_set <<

SELECT * FROM information_schema.tables 
WHERE table_catalog = 'itversity_retail_db' 
    AND table_schema = 'public' 
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
6 rows affected.
Returning data to local variable result_set


In [5]:
display(result_set)

table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
itversity_retail_db,public,categories,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,customers,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,departments,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,order_items,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,orders,BASE TABLE,,,,,,YES,NO,
itversity_retail_db,public,products,BASE TABLE,,,,,,YES,NO,


In [6]:
%sql SELECT * FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
8,2013-07-25 00:00:00,2911,PROCESSING
9,2013-07-25 00:00:00,5657,PENDING_PAYMENT
10,2013-07-25 00:00:00,5648,PENDING_PAYMENT


In [7]:
%sql SELECT * FROM order_items LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
1,1,957,1,299.98,299.98
2,2,1073,1,199.99,199.99
3,2,502,5,250.0,50.0
4,2,403,1,129.99,129.99
5,4,897,2,49.98,24.99
6,4,365,5,299.95,59.99
7,4,502,3,150.0,50.0
8,4,1014,4,199.92,49.98
9,5,957,1,299.98,299.98
10,5,365,5,299.95,59.99


In [8]:
%sql SELECT * FROM products LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


product_id,product_category_id,product_name,product_description,product_price,product_image
1,2,Quest Q64 10 FT. x 10 FT. Slant Leg Instant U,,59.98,http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy
2,2,Under Armour Men's Highlight MC Football Clea,,129.99,http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat
3,2,Under Armour Men's Renegade D Mid Football Cl,,89.99,http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
4,2,Under Armour Men's Renegade D Mid Football Cl,,89.99,http://images.acmesports.sports/Under+Armour+Men%27s+Renegade+D+Mid+Football+Cleat
5,2,Riddell Youth Revolution Speed Custom Footbal,,199.99,http://images.acmesports.sports/Riddell+Youth+Revolution+Speed+Custom+Football+Helmet
6,2,Jordan Men's VI Retro TD Football Cleat,,134.99,http://images.acmesports.sports/Jordan+Men%27s+VI+Retro+TD+Football+Cleat
7,2,Schutt Youth Recruit Hybrid Custom Football H,,99.99,http://images.acmesports.sports/Schutt+Youth+Recruit+Hybrid+Custom+Football+Helmet+2014
8,2,Nike Men's Vapor Carbon Elite TD Football Cle,,129.99,http://images.acmesports.sports/Nike+Men%27s+Vapor+Carbon+Elite+TD+Football+Cleat
9,2,Nike Adult Vapor Jet 3.0 Receiver Gloves,,50.0,http://images.acmesports.sports/Nike+Adult+Vapor+Jet+3.0+Receiver+Gloves
10,2,Under Armour Men's Highlight MC Football Clea,,129.99,http://images.acmesports.sports/Under+Armour+Men%27s+Highlight+MC+Football+Cleat


In [9]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
68883


In [10]:
%sql SELECT count(1) FROM order_items

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
172198


In [11]:
%sql SELECT count(1) FROM products

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
1345


## Selecting or Projecting Data

Let us understand different aspects of projecting data. We primarily using `SELECT` to project the data.
* We can project all columns using `*` or some columns using column names.
* We can provide aliases to a column or expression using `AS` in `SELECT` clause.
* `DISTINCT` can be used to get the distinct records from selected columns. We can also use `DISTINCT *` to get unique records using all the columns.
* As part of `SELECT` clause we can have aggregate functions such as `count`, `sum` etc.

In [12]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [13]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@localhost:5433/itversity_retail_db


In [14]:
%sql SELECT * FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
8,2013-07-25 00:00:00,2911,PROCESSING
9,2013-07-25 00:00:00,5657,PENDING_PAYMENT
10,2013-07-25 00:00:00,5648,PENDING_PAYMENT


In [15]:
%sql SELECT * FROM information_schema.columns WHERE table_catalog = 'itversity_retail_db' AND table_name = 'orders'

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
4 rows affected.


table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
itversity_retail_db,public,orders,order_id,1,,NO,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,itversity_retail_db,pg_catalog,int4,,,,,1,NO,NO,,,,,,NO,NEVER,,YES
itversity_retail_db,public,orders,order_date,2,,NO,timestamp without time zone,,,,,,6.0,,,,,,,,,,,,itversity_retail_db,pg_catalog,timestamp,,,,,2,NO,NO,,,,,,NO,NEVER,,YES
itversity_retail_db,public,orders,order_customer_id,3,,NO,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,itversity_retail_db,pg_catalog,int4,,,,,3,NO,NO,,,,,,NO,NEVER,,YES
itversity_retail_db,public,orders,order_status,4,,NO,character varying,45.0,180.0,,,,,,,,,,,,,,,,itversity_retail_db,pg_catalog,varchar,,,,,4,NO,NO,,,,,,NO,NEVER,,YES


In [16]:
%sql SELECT order_customer_id, order_date, order_status FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_customer_id,order_date,order_status
11599,2013-07-25 00:00:00,CLOSED
256,2013-07-25 00:00:00,PENDING_PAYMENT
12111,2013-07-25 00:00:00,COMPLETE
8827,2013-07-25 00:00:00,CLOSED
11318,2013-07-25 00:00:00,COMPLETE
7130,2013-07-25 00:00:00,COMPLETE
4530,2013-07-25 00:00:00,COMPLETE
2911,2013-07-25 00:00:00,PROCESSING
5657,2013-07-25 00:00:00,PENDING_PAYMENT
5648,2013-07-25 00:00:00,PENDING_PAYMENT


In [17]:
%sql SELECT order_customer_id, to_char(order_date, 'yyyy-MM'), order_status FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_customer_id,to_char,order_status
11599,2013-07,CLOSED
256,2013-07,PENDING_PAYMENT
12111,2013-07,COMPLETE
8827,2013-07,CLOSED
11318,2013-07,COMPLETE
7130,2013-07,COMPLETE
4530,2013-07,COMPLETE
2911,2013-07,PROCESSING
5657,2013-07,PENDING_PAYMENT
5648,2013-07,PENDING_PAYMENT


In [18]:
%sql SELECT order_customer_id, to_char(order_date, 'yyyy-MM') AS order_month, order_status FROM orders LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_customer_id,order_month,order_status
11599,2013-07,CLOSED
256,2013-07,PENDING_PAYMENT
12111,2013-07,COMPLETE
8827,2013-07,CLOSED
11318,2013-07,COMPLETE
7130,2013-07,COMPLETE
4530,2013-07,COMPLETE
2911,2013-07,PROCESSING
5657,2013-07,PENDING_PAYMENT
5648,2013-07,PENDING_PAYMENT


In [19]:
%sql SELECT DISTINCT to_char(order_date, 'yyyy-MM') AS order_month FROM orders

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


13 rows affected.


order_month
2014-01
2014-05
2013-12
2013-11
2014-04
2014-07
2014-03
2013-08
2013-10
2013-07


In [20]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.


count
68883


In [21]:
%sql SELECT count(DISTINCT to_char(order_date, 'yyyy-MM')) AS distinct_month_count FROM orders

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.


distinct_month_count
13


## Filtering Data

Let us understand how we can filter the data as part of our queries.
* We use `WHERE` clause to filter the data.
* All comparison operators such as `=`, `!=`, `>`, `<`, etc can be used to compare a column or expression or literal with another column or expression or literal.
* We can use operators such as `LIKE` with % and `regexp_matches` for pattern matching.
* Boolan `OR` and `AND` can be performed when we want to apply multiple conditions.
  * Get all orders with order_status equals to COMPLETE or CLOSED. We can also use IN operator.
  * Get all orders from month 2014 January with order_status equals to COMPLETE or CLOSED
* We need to use `IS NULL` and `IS NOT NULL` to compare against null values.

In [22]:
%sql SELECT * FROM orders WHERE order_status = 'COMPLETE' LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
3,2013-07-25 00:00:00,12111,COMPLETE
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
15,2013-07-25 00:00:00,2568,COMPLETE
17,2013-07-25 00:00:00,2667,COMPLETE
22,2013-07-25 00:00:00,333,COMPLETE
26,2013-07-25 00:00:00,7562,COMPLETE
28,2013-07-25 00:00:00,656,COMPLETE
32,2013-07-25 00:00:00,3960,COMPLETE


In [23]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
68883


In [24]:
%sql SELECT count(1) FROM orders WHERE order_status = 'COMPLETE'

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
22899


In [25]:
%sql SELECT * FROM orders WHERE order_status IN ('COMPLETE', 'CLOSED') LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
3,2013-07-25 00:00:00,12111,COMPLETE
4,2013-07-25 00:00:00,8827,CLOSED
5,2013-07-25 00:00:00,11318,COMPLETE
6,2013-07-25 00:00:00,7130,COMPLETE
7,2013-07-25 00:00:00,4530,COMPLETE
12,2013-07-25 00:00:00,1837,CLOSED
15,2013-07-25 00:00:00,2568,COMPLETE
17,2013-07-25 00:00:00,2667,COMPLETE
18,2013-07-25 00:00:00,1205,CLOSED


In [26]:
%sql SELECT count(1) FROM orders WHERE order_status IN ('COMPLETE', 'CLOSED')

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.


count
30455


In [27]:
%sql SELECT count(1) FROM orders WHERE order_status = 'COMPLETE' OR order_status = 'CLOSED'

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
30455


In [28]:
%%sql result_set <<

SELECT * FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM-dd') LIKE '2014-01%'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.
Returning data to local variable result_set


In [29]:
display(result_set)

order_id,order_date,order_customer_id,order_status
25882,2014-01-01 00:00:00,4598,COMPLETE
25888,2014-01-01 00:00:00,6735,COMPLETE
25889,2014-01-01 00:00:00,10045,COMPLETE
25891,2014-01-01 00:00:00,3037,CLOSED
25895,2014-01-01 00:00:00,1044,COMPLETE
25897,2014-01-01 00:00:00,6405,COMPLETE
25898,2014-01-01 00:00:00,3950,COMPLETE
25899,2014-01-01 00:00:00,8068,CLOSED
25900,2014-01-01 00:00:00,2382,CLOSED
25901,2014-01-01 00:00:00,3099,COMPLETE


In [30]:
%%sql result_set <<

SELECT * FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


10 rows affected.
Returning data to local variable result_set


In [31]:
display(result_set)

order_id,order_date,order_customer_id,order_status
25882,2014-01-01 00:00:00,4598,COMPLETE
25888,2014-01-01 00:00:00,6735,COMPLETE
25889,2014-01-01 00:00:00,10045,COMPLETE
25891,2014-01-01 00:00:00,3037,CLOSED
25895,2014-01-01 00:00:00,1044,COMPLETE
25897,2014-01-01 00:00:00,6405,COMPLETE
25898,2014-01-01 00:00:00,3950,COMPLETE
25899,2014-01-01 00:00:00,8068,CLOSED
25900,2014-01-01 00:00:00,2382,CLOSED
25901,2014-01-01 00:00:00,3099,COMPLETE


In [32]:
%%sql result_set <<

SELECT count(1) FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM-dd') LIKE '2014-01%'

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [33]:
display(result_set)

count
2544


In [34]:
%%sql result_set <<

SELECT count(1) FROM orders 
WHERE order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [35]:
display(result_set)

count
2544


## Joining Tables – Inner

Let us understand how to join data from multiple tables.

* We will primarily focus on ASCII style join (**JOIN with ON**).
* There are different types of joins.
  * INNER JOIN - Get all the records from both the datasets which satisfies JOIN condition.
  * OUTER JOIN - We will get into the details as part of the next topic
* Example for INNER JOIN

```
SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10
```

* We can join more than 2 tables in one query. Here is how it will look like.

```
SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
    JOIN products p
    ON p.product_id = oi.order_item_product_id
LIMIT 10
```

* If we have to apply additional filters, it is recommended to use WHERE clause. ON clause should only have join conditions.
* We can have non equal join conditions as well, but they are not used that often.
* Here are some of the examples for INNER JOIN:
  * Get order id, date, status and item revenue for all order items.
  * Get order id, date, status and item revenue for all order items for all orders where order status is either COMPLETE or CLOSED.
  * Get order id, date, status and item revenue for all order items for all orders where order status is either COMPLETE or CLOSED for the orders that are placed in the month of 2014 January.

In [36]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.
Returning data to local variable result_set


In [37]:
display(result_set)

order_id,order_date,order_status,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,299.98
2,2013-07-25 00:00:00,PENDING_PAYMENT,199.99
2,2013-07-25 00:00:00,PENDING_PAYMENT,250.0
2,2013-07-25 00:00:00,PENDING_PAYMENT,129.99
4,2013-07-25 00:00:00,CLOSED,49.98
4,2013-07-25 00:00:00,CLOSED,299.95
4,2013-07-25 00:00:00,CLOSED,150.0
4,2013-07-25 00:00:00,CLOSED,199.92
5,2013-07-25 00:00:00,COMPLETE,299.98
5,2013-07-25 00:00:00,COMPLETE,299.95


In [38]:
%sql SELECT count(1) FROM orders

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.


count
68883


In [39]:
%sql SELECT count(1) FROM order_items

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.


count
172198


In [40]:
%%sql result_set <<

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [41]:
display(result_set)

count
172198


In [42]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.
Returning data to local variable result_set


In [43]:
display(result_set)

order_id,order_date,order_status,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,299.98
4,2013-07-25 00:00:00,CLOSED,49.98
4,2013-07-25 00:00:00,CLOSED,299.95
4,2013-07-25 00:00:00,CLOSED,150.0
4,2013-07-25 00:00:00,CLOSED,199.92
5,2013-07-25 00:00:00,COMPLETE,299.98
5,2013-07-25 00:00:00,COMPLETE,299.95
5,2013-07-25 00:00:00,COMPLETE,99.96
5,2013-07-25 00:00:00,COMPLETE,299.98
5,2013-07-25 00:00:00,COMPLETE,129.99


In [44]:
%%sql result_set <<

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [45]:
display(result_set)

count
75408


In [46]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_subtotal
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


10 rows affected.
Returning data to local variable result_set


In [47]:
display(result_set)

order_id,order_date,order_status,order_item_subtotal
25882,2014-01-01 00:00:00,COMPLETE,299.97
25882,2014-01-01 00:00:00,COMPLETE,100.0
25882,2014-01-01 00:00:00,COMPLETE,79.98
25882,2014-01-01 00:00:00,COMPLETE,399.98
25888,2014-01-01 00:00:00,COMPLETE,299.98
25889,2014-01-01 00:00:00,COMPLETE,99.96
25889,2014-01-01 00:00:00,COMPLETE,19.99
25891,2014-01-01 00:00:00,CLOSED,150.0
25891,2014-01-01 00:00:00,CLOSED,50.0
25891,2014-01-01 00:00:00,CLOSED,119.97


In [48]:
%%sql result_set <<

SELECT count(1)
FROM orders o JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
    AND to_char(order_date, 'yyyy-MM') = '2014-01'
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [49]:
display(result_set)

count
6198


## Joining Tables - Outer

Let us understand how to perform outer joins using SQL. There are 3 different types of outer joins.
* `LEFT OUTER JOIN` (default) - Get all the records from both the datasets which satisfies JOIN condition along with those records which are in the left side table but not in the right side table.
* `RIGHT OUTER JOIN` - Get all the records from both the datasets which satisfies JOIN condition along with those records which are in the right side table but not in the left side table.
* `FULL OUTER JOIN` - left union right
* When we perform the outer join (lets say left outer join), we will see this.
  * Get all the values from both the tables when join condition satisfies.
  * If there are rows on left side tables for which there are no corresponding values in right side table, all the projected column values for right side table will be null.
* Here are some of the examples for outer join.
    * Get all the orders where there are no corresponding order items.
    * Get all the order items where there are no corresponding orders.

In [50]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
10 rows affected.
Returning data to local variable result_set


In [51]:
display(result_set)

order_id,order_date,order_status,order_item_order_id,order_item_subtotal
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,199.99
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,250.0
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,129.99
4,2013-07-25 00:00:00,CLOSED,4,49.98
4,2013-07-25 00:00:00,CLOSED,4,299.95
4,2013-07-25 00:00:00,CLOSED,4,150.0
4,2013-07-25 00:00:00,CLOSED,4,199.92
7,2013-07-25 00:00:00,COMPLETE,7,199.99
7,2013-07-25 00:00:00,COMPLETE,7,299.98
7,2013-07-25 00:00:00,COMPLETE,7,79.95


In [52]:
%%sql result_set <<

SELECT count(1)
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [53]:
display(result_set)

count
183650


In [54]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE oi.order_item_order_id IS NULL
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


10 rows affected.
Returning data to local variable result_set


In [55]:
display(result_set)

order_id,order_date,order_status,order_item_order_id,order_item_subtotal
47,2013-07-25 00:00:00,PENDING_PAYMENT,,
55,2013-07-25 00:00:00,PENDING,,
79,2013-07-25 00:00:00,PENDING_PAYMENT,,
82,2013-07-25 00:00:00,PENDING_PAYMENT,,
108,2013-07-26 00:00:00,PROCESSING,,
109,2013-07-26 00:00:00,PENDING_PAYMENT,,
126,2013-07-26 00:00:00,COMPLETE,,
176,2013-07-26 00:00:00,PENDING_PAYMENT,,
199,2013-07-26 00:00:00,ON_HOLD,,
218,2013-07-26 00:00:00,COMPLETE,,


In [56]:
%%sql result_set <<

SELECT count(1)
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE oi.order_item_order_id IS NULL

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [57]:
display(result_set)

count
11452


In [58]:
%%sql result_set <<

SELECT count(1)
FROM orders o LEFT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE oi.order_item_order_id IS NULL
    AND o.order_status IN ('COMPLETE', 'CLOSED')

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


1 rows affected.
Returning data to local variable result_set


In [59]:
display(result_set)

count
5189


In [60]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o RIGHT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


10 rows affected.
Returning data to local variable result_set


In [61]:
display(result_set)

order_id,order_date,order_status,order_item_order_id,order_item_subtotal
1,2013-07-25 00:00:00,CLOSED,1,299.98
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,199.99
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,250.0
2,2013-07-25 00:00:00,PENDING_PAYMENT,2,129.99
4,2013-07-25 00:00:00,CLOSED,4,49.98
4,2013-07-25 00:00:00,CLOSED,4,299.95
4,2013-07-25 00:00:00,CLOSED,4,150.0
4,2013-07-25 00:00:00,CLOSED,4,199.92
5,2013-07-25 00:00:00,COMPLETE,5,299.98
5,2013-07-25 00:00:00,COMPLETE,5,299.95


In [62]:
%%sql result_set <<

SELECT count(1)
FROM orders o RIGHT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db
1 rows affected.
Returning data to local variable result_set


In [63]:
display(result_set)

count
172198


In [64]:
%%sql result_set <<

SELECT o.order_id,
    o.order_date,
    o.order_status,
    oi.order_item_order_id,
    oi.order_item_subtotal
FROM orders o RIGHT OUTER JOIN order_items oi
    ON o.order_id = oi.order_item_order_id
WHERE o.order_id IS NULL
LIMIT 10

 * postgresql://itversity_retail_user:***@localhost:5433/itversity_retail_db


0 rows affected.
Returning data to local variable result_set


In [65]:
display(result_set)

order_id,order_date,order_status,order_item_order_id,order_item_subtotal


## Performing Aggregations

## Sorting Data

## Solution – Daily Product Revenue