# Joins

JOINS in SQL are commands which are used to combine rows from two or more tables, based on a related column between those tables.  There are predominantly used when a user is trying to extract data from tables which have one-to-many or many-to-many relationships between them. 

There are four main types of joins. 

1. Full Join
    1. We will join two different tables over one column that is the same in both
    1. Also, we will do a few queries to see the data  
    1. Syntax/Examples of full joins - https://www.tutorialspoint.com/sql/sql-full-joins.htm
1. Inner Join
    1. We will discuss what inner join is and how it can be used
    1. Syntax/Examples of inner join - http://www.tutorialspoint.com/sql/sql-inner-joins.htm
1. Left Join
    1. We will discuss what left join is and how it can be used 
    1. Syntax/Examples of left join -  http://www.tutorialspoint.com/sql/sql-left-joins.htm  
1. Right Join
    1. We will discuss what right join is and how it can be used 
    1. Syntax/Examples of right join - http://www.tutorialspoint.com/sql/sql-right-joins.htm   
    

## Connecting to PostgreSQL 

We will connect to the postgreSQl database again the same way we did in the last lab using the notebook commands: 


In [None]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_ro

## Full JOIN

Now we will do our first basic join of the us_second_order_divisions table and the util_us_states table.

We must first view both these tables and see where our data overlaps and we can join them. 

If we use the terminal, the following two commands will let us see the column metadata, to see where the data overlaps:

```SQL
dsa_ro=> \d us_second_order_divisions
        Table "public.us_second_order_divisions"
       Column       |          Type          | Modifiers 
--------------------+------------------------+-----------
 state_number_code  | smallint               | not null
 county_number_code | character varying(5)   | not null
 county_name        | character varying(100) | 
Indexes:
    "us_second_order_divisions_pkey" PRIMARY KEY, btree (state_number_code, county_number_code)

dsa_ro=> \d util_us_states
             Table "public.util_us_states"
      Column       |         Type          | Modifiers 
-------------------+-----------------------+-----------
 state_alpha_code  | character(2)          | not null
 state_number_code | smallint              | 
 state_name        | character varying(50) | 
Indexes:
    "util_us_states_pkey" PRIMARY KEY, btree (state_alpha_code)
    "util_us_states_state_number_code" btree (state_number_code)


```   

We can see the column names for the two tables by running the following two SELECT statements using Jupyter's sql extension:

In [None]:
%sql SELECT * FROM us_second_order_divisions LIMIT 0;

In [None]:
%sql SELECT * FROM util_us_states LIMIT 0;

If we look at both these tables, we will see that they both contain a **`state_number_code`**!

This means that we can join these two tables over that **`state_number_code`**. 
To join the tables means to take the data from both tables and connect them into one output, linking the rows based on the common columns.

```SQL
SELECT * 
FROM us_second_order_divisions AS sod 
FULL JOIN util_us_states AS uus 
  ON sod.state_number_code = uus.state_number_code;
```

**Note:** In the above statement, we have utilized table aliases, e.g., `util_us_states AS uus`. Without the aliases, it would look like this:

```SQL
SELECT * 
FROM us_second_order_divisions 
FULL JOIN util_us_states 
  ON us_second_order_divisions.state_number_code = util_us_states.state_number_code;
```


**Note:** The below query will return 3295 rows.
You should click under the resulting `Out[#]` to turn on _cell scrolling_.

In [None]:
%%sql 
SELECT * 
FROM us_second_order_divisions AS sod 
FULL JOIN util_us_states AS uus 
  ON sod.state_number_code = uus.state_number_code;

### Explain and Query Plan Preview
Later in this module we will cover in more detail the **Query Plan** concept.
You can examine the estimated cost and rows by using `EXPLAIN` in front of the SQL command.
The cell below shows an example of explaining a query, using the query above.

Compare the two explain commands below. One is a proper join, the other is an accidental cross-product (explained below).

In [None]:
%%sql 
EXPLAIN
SELECT * 
FROM us_second_order_divisions AS sod 
FULL JOIN util_us_states AS uus 
  ON sod.state_number_code = uus.state_number_code;

In [None]:
%%sql 
EXPLAIN
SELECT * 
FROM us_second_order_divisions AS sod 
, util_us_states AS uus 

Note in the first plan, the total cost is low and estimated <span style='background:yellow'>rows=3295</span>.
```
Hash Join (cost=2.35..99.61 rows=3295 width=29)
```

However, the second plan the rows explode to 197,700;  <span style='background:yellow'>rows=197700</span>
```
Nested Loop (cost=0.00..2524.95 rows=197700 width=29)
```

--- 

**<span style="background:yellow">WARNING:</span>**  
When combining data from multiple tables, you must give your query a _join condition_.
Otherwise, the tables will be combined in a cartesian product, also known as a _cross join_.
A cross join will combine every row of each table together, even when the overlapping columns do not match.   



In [None]:
%%sql
SELECT count(*) 
FROM us_second_order_divisions AS sod, util_us_states AS uus;

**Notice, in the cross join the number of rows explodes to 197,700 instead of the less than 4000 rows of the proper join.**

We just execute a count because the result would be too large for this notebook learning environment.


## Viewing New Data Sets

For this next part we will be using two new tables in the database. 
We will be using the `customers` table and the `orders` table.

Again, we examine these tables to determine where they overlap and with which attributes, we can join the two tables. 

Describing the tables in the PostgreSQL database:

```SQL
dsa_ro=> \d orders
       Table "public.orders"
   Column    |  Type   | Modifiers 
-------------+---------+-----------
 order_id    | integer | 
 customer_id | integer | 
 employee_id | integer | 
 order_price | integer | 

dsa_ro=> \d customers
             Table "public.customers"
    Column     |         Type          | Modifiers 
---------------+-----------------------+-----------
 customer_id   | integer               | 
 customer_name | character varying(32) | 
 contact_name  | character varying(32) | 
 address       | character varying(64) | 
 city          | character varying(16) | 
 zipcode       | integer               | 
 country       | character varying(16) | 

```
        


We can see that the `customer_id` column in each table is likely the same.  
In many cases, database definitions will include foreign key references for a key.
An example of this is provided below.

---
#### Example database table with three FKs

We will discuss this more during the database design modules of the course.


```SQL
                                    Table "atc.task"
  Column   |            Type             |                  Modifiers                   
-----------+-----------------------------+----------------------------------------------
 jobid     | bigint                      | not null
 taskid    | integer                     | not null
 tasktype  | character varying(50)       | not null
 starttime | timestamp without time zone | 
 status    | character varying(10)       | not null default 'QUEUED'::character varying
 statusmsg | text                        | 
 stoptime  | timestamp without time zone | 
 priority  | integer                     | not null default 500
 descr     | text                        | 
Indexes:
    "pk_task" PRIMARY KEY, btree (jobid, taskid)
Foreign-key constraints:
    "fk_task_job" FOREIGN KEY (jobid) REFERENCES atc.job(jobid) ON UPDATE CASCADE ON DELETE CASCADE
    "fk_task_status" FOREIGN KEY (status) REFERENCES atc.task_status_type(status)
    "fk_task_tasktype" FOREIGN KEY (tasktype) REFERENCES atc.task_type_tbl(tasktype) ON UPDATE CASCADE ON DELETE CASCADE
```
---

We now continue by looking at some JOIN SQL examples.


## Inner Join 

To INNER JOIN the `orders` and `customers` we will use the `orders.customer_id = customers.customer_id` join condition.

#### Example: List the Order ID and Shipping Address for all orders
 1. Traditional SQL, INNER JOIN
 1. Just JOIN, defaults to INNER
 1. NATURAL JOIN is a short-cut in some databases, such as PostgreSQL. The NATURAL JOIN implicitly joins on all like named columns between the tables.  In this case, `customer_id`.
 1. Join with WHERE clause _join condition_
 1. USING shortcut, list of like named columns between the tables.  In this case, `customer_id`.
 1. Switched Join Order
 1. Using Table Aliases

In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
INNER JOIN customers 
ON orders.customer_id = customers.customer_id;


In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
JOIN customers 
ON orders.customer_id = customers.customer_id;

In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
NATURAL JOIN customers;

In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders, customers 
WHERE orders.customer_id = customers.customer_id;


In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
INNER JOIN customers 
USING (customer_id);


In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM customers
JOIN  orders 
ON orders.customer_id = customers.customer_id;


In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders AS o
JOIN  customers AS c
ON o.customer_id = c.customer_id;


## Left Join

You will see that not all the customers or orders are included in this result - only the ones that had a `customer_id` which appeared in both lists.

To ensure we see all the `customers`, even those without orders, we need to use a `LEFT JOIN` with the customers as the left table.

Recall the INNER JOIN only shows `customers` that match the orders.
So, to get the desired result we will align `customers` as the left table and use a LEFT JOIN.


In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM customers 
LEFT JOIN orders 
ON orders.customer_id = customers.customer_id;


**Notice** we have retrieved two additional rows, 9 in total.

We see the last two rows are rendered with `None` in the `order_id` column.  
That is because this value comes from the right table and does not have matching rows to supply data.

Looking at the `orders` as the left table.

In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
LEFT JOIN customers 
ON orders.customer_id = customers.customer_id;


You will see in the results that we have 3 orders with no customer related to it.

We still see the data in our results because we used a Left Join and the left table in our query was the `orders` table.

## Right Join

Now it's time for Right Join!

This join is the opposite of the Left Join. 

This join will show all of the data from the right table, `customers`, in our query and provide `None` values for the rows that do not match.

In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
RIGHT JOIN customers 
ON orders.customer_id = customers.customer_id;


        
You will see again that we have 2 customers who have never made an order with us. But these are still shown in the results because our Right Join will show all of the data from the right table (customers).

## Outer Join

Recall that an OUTER JOIN produces the combined rows of the RIGHT JOIN and the LEFT JOIN.

The syntax for PostgreSQL is 
```SQL
FROM 
    TableA FULL OUTER JOIN TableB
    ... 
```

In [None]:
%%sql

SELECT order_id, address, city, zipcode, country
FROM orders 
FULL OUTER JOIN customers 
ON orders.customer_id = customers.customer_id;


The results of the OUTER JOIN are composed of three sets of rows:
  1. We see the first seven rows, 1 - 7, are from the INNER JOIN.
  1. Then we see the four rows formed from the LEFT JOIN with `None` in address fields, where the `order_id` was supplied by the left table (`orders`).
  1. Finally, we see the two rows formed from the RIGHT JOIN with `None` in `order_id` fields, where the the address fields were supplied by the right table (`customers`).

Here is a quick image of the different types of joins.

 ![Joins image](Joins.jpg)


# Preview 

The true power of the relational model begins to emerge with the JOIN.
As you continue through the course you will learn how to use aggregation and statistics and see more advanced concepts.
Almost always, these other concepts are leveraging JOIN to integrate the data of multiple tables.


# Save your Notebook, then `File > Close and Halt`

---