# 05 Join tables togethor

Before we talk about join, we need to understand the relationships between tables. There are three types of relations:
- one to one : A record in one table is related to one record in another table.
- one-to-many: A record in one table is related to many records in another table.
- many-to-many: Multiple records in one table are related to multiple records in another table.

**Handling a one-to-one relationship or a one-or-many relationship can be done by adding the primary key of one table into the other table as a foreign key.** To bring the two table togethor, we need to use the join on **primary key and foreign key**.

Let's show the columns of **CUSTOMERs** and **ORDERS** tables, and you can notice it is a one-to-many relationship. Because one customer can have many customer orders.


In [1]:
%load_ext sql
%config SqlMagic.autocommit=False
%config SqlMagic.autolimit=20
%config SqlMagic.displaylimit=20
%sql postgresql://user-pengfei:gv8eba5xmsw4kt2uk1mn@postgresql-124499/test

You can notice in table customers, the customer_id is primary key. In table orders, customer_id is the foreign key. 

In [2]:
%%sql
select * from orders limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,customer_id,employee_id,order_date,required_date,shipped_date,ship_via,freight,ship_name,ship_address,ship_city,ship_region,ship_postal_code,ship_country
10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France
10249,TOMSP,6,1996-07-05,1996-08-16,1996-07-10,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany
10250,HANAR,4,1996-07-08,1996-08-05,1996-07-12,2,65.83,Hanari Carnes,"Rua do Paço, 67",Rio de Janeiro,RJ,05454-876,Brazil
10251,VICTE,3,1996-07-08,1996-08-05,1996-07-15,1,41.34,Victuailles en stock,"2, rue du Commerce",Lyon,,69004,France
10252,SUPRD,4,1996-07-09,1996-08-06,1996-07-11,2,51.3,Suprêmes délices,"Boulevard Tirou, 255",Charleroi,,B-6000,Belgium


In [3]:
%%sql
select * from customers limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


customer_id,company_name,contact_name,contact_title,address,city,region,postal_code,country,phone,fax
ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constitución 2222,México D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,Mataderos 2312,México D.F.,,05023,Mexico,(5) 555-3932,
AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Berguvsvägen 8,Luleå,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67


## Join types

There are seven common join types, you can check them in the below figure. We will discuss them one by one
![sql_join_type_chart](https://raw.githubusercontent.com/pengfei99/LearningSQL/main/Oreilly_getting_started_with_sql/img/sql_join_type_chart.png) 

## Why we normalize data?
You may ask why we separate data into tables and merge them back?

- Because through normalization, we can store data efficiently, no duplicates means less error and easy to maintain. 
- Merge tables together on common fields to create more descriptive views of the data for easy data analysis.

## Table relationship overview
In this section, we will learn how to merge different tables. As a result, the relation between tables become important. Below figure
shows the relationship between tables in our testing database.
![northwind_schema](https://raw.githubusercontent.com/pengfei99/LearningSQL/main/SQL_practice_problems/img/northwind_schema.PNG)

## 5.1 Inner join

**The INNER JOIN allows us to merge two tables together**. But if we are going to merge tables, we need to define a commonality between the two in order to line up records from both tables. In another we need to **identify one or more common columns between the two tables**.

In general, in a one to many relation, **the common columns are the primary key of the one, and the foreign key of the many**. In our case, for the `customers` and `orders` table, the common column is the **customer_id** column 


Imagine that we want to find out how to contact the customer that places the order by phone. But you can notice there is no phone info in the orders table. So we need to join the customers table with orders table to get the phone number. Below query is an example of the inner join 


In [6]:
%%sql

select order_id, customers.customer_id,
order_date,
phone
from customers
inner join orders
on customers.customer_id= orders.customer_id
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,customer_id,order_date,phone
10248,VINET,1996-07-04,26.47.15.10
10249,TOMSP,1996-07-05,0251-031259
10250,HANAR,1996-07-08,(21) 555-0091
10251,VICTE,1996-07-08,78.32.54.86
10252,SUPRD,1996-07-09,(071) 23 67 22 20


In the above sql request, the first thing you can notice is that we need to use an explicit syntax **customers.customer_id** or **orders.customer_id**. Because this column exists in both table. If we don't explicitly specify the table name, the database server can't figur out which table should be used. For the column name that only exist in one table (e.g. phone, order_date), we don't need to specify the table name.

The **FROM statement is where we execute our INNER JOIN**. We specify that we are pulling from `CUSTOMERS` table and inner joining it with `ORDERS` , and that the commonality between the two table is on the `CUSTOMER_ID` column.


Important note: **with INNER JOIN , any records that do not have a common joined value in both tables will be excluded**. If a customer do not have any order, he will be exluded from the merged table. 

If we want to include all records from the CUSTOMERS table, we need to use a LEFT JOIN

### 5.1.1 Table name alias

The explicite table name could be very annoying if the table name is extremely long. In that case, we can use an alias to replace the full table name. In below query, we use `c` as alias of the table `customers`, and `o` as alias of the table `orders`. As a result, we can use `c.customer_id` to replace `customers.customer_id`. 

In [19]:
%%sql

select order_id, c.customer_id,
order_date,
phone
from customers c
inner join orders o
on c.customer_id= o.customer_id
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,customer_id,order_date,phone
10248,VINET,1996-07-04,26.47.15.10
10249,TOMSP,1996-07-05,0251-031259
10250,HANAR,1996-07-08,(21) 555-0091
10251,VICTE,1996-07-08,78.32.54.86
10252,SUPRD,1996-07-09,(071) 23 67 22 20


### 5.1.2 Alias with wildcard

We can use `*` after a table name alias as wildcard to select all columns of a table. In below query, we use `o.*` to select all columns of table `ORDERS`

In [25]:
%%sql

select o.*, c.*
from customers c
inner join orders o
on c.customer_id= o.customer_id
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,customer_id,employee_id,order_date,required_date,shipped_date,ship_via,freight,ship_name,ship_address,ship_city,ship_region,ship_postal_code,ship_country,customer_id_1,company_name,contact_name,contact_title,address,city,region,postal_code,country,phone,fax
10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,VINET,Vins et alcools Chevalier,Paul Henriot,Accounting Manager,59 rue de l'Abbaye,Reims,,51100,France,26.47.15.10,26.47.15.11
10249,TOMSP,6,1996-07-05,1996-08-16,1996-07-10,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany,TOMSP,Toms Spezialitäten,Karin Josephs,Marketing Manager,Luisenstr. 48,Münster,,44087,Germany,0251-031259,0251-035695
10250,HANAR,4,1996-07-08,1996-08-05,1996-07-12,2,65.83,Hanari Carnes,"Rua do Paço, 67",Rio de Janeiro,RJ,05454-876,Brazil,HANAR,Hanari Carnes,Mario Pontes,Accounting Manager,"Rua do Paço, 67",Rio de Janeiro,RJ,05454-876,Brazil,(21) 555-0091,(21) 555-8765
10251,VICTE,3,1996-07-08,1996-08-05,1996-07-15,1,41.34,Victuailles en stock,"2, rue du Commerce",Lyon,,69004,France,VICTE,Victuailles en stock,Mary Saveley,Sales Agent,"2, rue du Commerce",Lyon,,69004,France,78.32.54.86,78.32.54.87
10252,SUPRD,4,1996-07-09,1996-08-06,1996-07-11,2,51.3,Suprêmes délices,"Boulevard Tirou, 255",Charleroi,,B-6000,Belgium,SUPRD,Suprêmes délices,Pascale Cartrain,Accounting Manager,"Boulevard Tirou, 255",Charleroi,,B-6000,Belgium,(071) 23 67 22 20,(071) 23 67 22 21


## 5.2 Left (outer) join

In some case, we may want to join the `customers` and `orders` tables and see all rows of customer, even if some customer never placed an order. We can accomplish this with a LEFT JOIN.

### 5.2.1 left(outer) inclusive join

By default the left(outer) join is inclusive, which means all member of the left table is included in the result table.


In [26]:
%%sql

select order_id, c.customer_id,
order_date,
phone
from customers c
left join orders o
on c.customer_id= o.customer_id
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,customer_id,order_date,phone
10248,VINET,1996-07-04,26.47.15.10
10249,TOMSP,1996-07-05,0251-031259
10250,HANAR,1996-07-08,(21) 555-0091
10251,VICTE,1996-07-08,78.32.54.86
10252,SUPRD,1996-07-09,(071) 23 67 22 20


### 5.2.2 left(outer) exclusive join
 
It is also common to use LEFT JOIN to check for “orphaned” child records that have no parent, or conversely a parent that has no children (e.g., orders that have no customers, or customers that have no orders).

Below query shows all customers that have zero order, by using filter **order_id is null**.

In the returned merge table, you can notice all the fields(columns) that come from table `ORDERS` are null, because the two customers(e.g. PARIS and FISSA) have never placed an order, as a result there were no corresponding rows in table `ORDERS` to join to. 


In [28]:
%%sql

select c.customer_id, o.*
from customers c
left join orders o
on c.customer_id=o.customer_id
where order_id is null

 * postgresql://user-pengfei:***@postgresql-124499/test
2 rows affected.


customer_id,order_id,customer_id_1,employee_id,order_date,required_date,shipped_date,ship_via,freight,ship_name,ship_address,ship_city,ship_region,ship_postal_code,ship_country
PARIS,,,,,,,,,,,,,,
FISSA,,,,,,,,,,,,,,


## 5.3 right join

Right join is similar to the left join. If you inverse the table order in the left join, a left join can simulate the behavior of the right join.

However, the RIGHT JOIN is rarely used and should be avoided. You should stick to convention and prefer left outer joins with LEFT JOIN , and put the “all records” table on the left side of the join operator.

## 5.4 full outer join

There also is a **full outer join** operator called `OUTER JOIN` that includes all records from both tables. It **does a LEFT JOIN and a RIGHT JOIN simultaneously, and can have null records in both tables**. It can be helpful to find orphaned records in both directions simultaneously in a single query, but it also is seldom used.


## 5.5 Joining Multiple Tables

Tables can have relationships with more than one table. For instance, A given table can be the child of more than one parent table, and a table can be the parent to one table but a child to another.

Let's check our database, we can observe the relationship between **order_details and orders** . And we can include another table **products** which will make order_details more meaningful. Notice that the `order_details` table has a `order_id` column, which corresponds to an order in the `orders` table, and a `product_id` column, which corresponds to an product in the `products` table.

Below query join the three tables togethor (e.g. order_details, orders, products). Note after the from operator, we have two inner join operator, the order can be changed, and the result is the same.


In [36]:
%%sql

select od.*, o.*, p.*
FROM order_details as od
inner join orders as o on od.order_id=o.order_id
inner join products as p on od.product_id=p.product_id 
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,product_id,unit_price,quantity,discount,order_id_1,customer_id,employee_id,order_date,required_date,shipped_date,ship_via,freight,ship_name,ship_address,ship_city,ship_region,ship_postal_code,ship_country,product_id_1,product_name,supplier_id,category_id,quantity_per_unit,unit_price_1,units_in_stock,units_on_order,reorder_level,discontinued
10248,11,14.0,12,0.0,10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,11,Queso Cabrales,5,4,1 kg pkg.,21.0,22,30,30,0
10248,42,9.8,10,0.0,10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,42,Singaporean Hokkien Fried Mee,20,5,32 - 1 kg pkgs.,14.0,26,0,0,1
10248,72,34.8,5,0.0,10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,72,Mozzarella di Giovanni,14,4,24 - 200 g pkgs.,34.8,14,0,0,0
10249,14,18.6,9,0.0,10249,TOMSP,6,1996-07-05,1996-08-16,1996-07-10,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany,14,Tofu,6,7,40 - 100 g pkgs.,23.25,35,0,0,0
10249,51,42.4,40,0.0,10249,TOMSP,6,1996-07-05,1996-08-16,1996-07-10,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany,51,Manjimup Dried Apples,24,7,50 - 300 g pkgs.,53.0,20,0,10,0


We’ve merged these three tables, we can use fields from all three tables to create expressions. If we want to find the revenue for each order, we can multiply `quantity` column from `order_details` and `unit_price` column from `products` table, even though those fields exist in two separate tables.


In [39]:
%%sql

select od.order_id, p.product_id, od.quantity,  p.unit_price, od.quantity*p.unit_price as total_revenue
FROM order_details as od
inner join orders as o on od.order_id=o.order_id
inner join products as p on od.product_id=p.product_id 
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


order_id,product_id,quantity,unit_price,total_revenue
10248,11,12,21.0,252.0
10248,42,10,14.0,140.0
10248,72,5,34.8,173.9999961853027
10249,14,9,23.25,209.25
10249,51,40,53.0,2120.0


## 5.6 Grouping joins

In the above section, we have used join to calculate the revenu for each order. Now suppose we want to find the total revenue by customer. With previous experience, we know we need to do the following steps:
1. join the three table and calculate the revenue for each order
2. group by the customer and sum the all revenue belong to user.


In [42]:
%%sql

select c.customer_id,
contact_name AS customer_name,
sum(quantity*unit_price) as total_revenue
from orders as o
inner join customers as c on c.customer_id=o.customer_id
inner join order_details as od on od.order_id=o.order_id
group by c.customer_id, customer_name
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


customer_id,customer_name,total_revenue
ALFKI,Maria Anders,4596.200004577637
ANATR,Ana Trujillo,1402.949990272522
ANTON,Antonio Moreno,7515.349945068359
AROUT,Thomas Hardy,13806.49998140335
BERGS,Christina Berglund,26968.149930477142


Now we want to see revenue of all customer, even for the customer that has zero order.

In [51]:
%%sql

select c.customer_id,
contact_name AS customer_name,
sum(quantity*unit_price) as total_revenue
from customers as c
left join orders as o on c.customer_id=o.customer_id
left join order_details as od on od.order_id=o.order_id
group by c.customer_id, customer_name
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
5 rows affected.


customer_id,customer_name,total_revenue
ALFKI,Maria Anders,4596.200004577637
ANATR,Ana Trujillo,1402.949990272522
ANTON,Antonio Moreno,7515.349945068359
AROUT,Thomas Hardy,13806.49998140335
BERGS,Christina Berglund,26968.149930477142


Note we have replaced all two inner joins by left join. Because if we only changed the first one into left join, the seconde inner join will exclude the customer with zero order even though the inner join does not happen directly on `customers` and `order_details` table. This is because null values produced by the first left join cannot be inner joined with table `order_details`. **All null values will always get filtered out in an inner join. A LEFT JOIN tolerates null values.**

We can filter the customer which has not placed an order by adding `where o.order_id is null` 

In [55]:
%%sql

select c.customer_id,
contact_name AS customer_name,
sum(quantity*unit_price) as total_revenue
from customers as c
left join orders as o on c.customer_id=o.customer_id
left join order_details as od on od.order_id=o.order_id
where o.order_id is null
group by c.customer_id, customer_name
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
2 rows affected.


customer_id,customer_name,total_revenue
FISSA,Diego Roel,
PARIS,Marie Bertrand,


Try to modify the second left join to inner join in the above query, and check the output

### 5.6.1 Replace null 

we may want the default values of `total_revenue` set to 0 instead of null if there are no sales. We can accomplish this simply with the coalesce() function which we learned before.

In [58]:
%%sql

select c.customer_id,
contact_name AS customer_name,
coalesce(sum(quantity*unit_price),0) as total_revenue
from customers as c
left join orders as o on c.customer_id=o.customer_id
left join order_details as od on od.order_id=o.order_id
where o.order_id is null
group by c.customer_id, customer_name
limit 5;

 * postgresql://user-pengfei:***@postgresql-124499/test
2 rows affected.


customer_id,customer_name,total_revenue
FISSA,Diego Roel,0.0
PARIS,Marie Bertrand,0.0
