# 08 - Joining Tables

When we design an entire database system using good design principles like normalization, different aspects of the information need to be separated into normalized tables. Under such a case, we often require the use of joins to retrieve data from multiple tables in a single SELECT query. Two tables can be joined by a single join operator, but the result can be joined again with other tables. There must exist a same or similar column between the tables being joined.

To connect tables in a query, we use a JOIN ... ON statement. There are different types of SQL joins:

- INNER JOIN (or sometimes called simple join)
- LEFT OUTER JOIN (or sometimes called LEFT JOIN)
- CROSS JOIN
- RIGHT OUTER JOIN (not supported in SQLite)
- FULL OUTER JOIN (not supported in SQLite)

In [1]:
import mysql.connector as sql
import pandas as pd

In [2]:
connection = sql.connect(
    host="localhost",
    user="root",
    password="12345"
)

cursor = connection.cursor()

For this notebook, let's see which tables sakila db contains.
(Sakila DB is a sample database. It is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth.)

In [4]:
pd.read_sql_query("""
    SHOW TABLES
    FROM sakila
    """,
    connection)

Unnamed: 0,Tables_in_sakila
0,actor
1,actor_info
2,address
3,category
4,city
5,country
6,customer
7,customer_list
8,film
9,film_actor


Let's check which tables we can use for our join practice.

In [21]:
pd.read_sql_query("""
    DESCRIBE sakila.customer
    """,
    connection)

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,customer_id,b'smallint unsigned',NO,PRI,,auto_increment
1,store_id,b'tinyint unsigned',NO,MUL,,
2,first_name,b'varchar(45)',NO,,,
3,last_name,b'varchar(45)',NO,MUL,,
4,email,b'varchar(50)',YES,,,
5,address_id,b'smallint unsigned',NO,MUL,,
6,active,b'tinyint(1)',NO,,b'1',
7,create_date,b'datetime',NO,,,
8,last_update,b'timestamp',YES,,b'CURRENT_TIMESTAMP',DEFAULT_GENERATED on update CURRENT_TIMESTAMP


In [12]:
pd.read_sql_query("""
    DESCRIBE sakila.customer_list
    """,
    connection)

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,ID,b'smallint unsigned',NO,,b'0',
1,name,b'varchar(91)',YES,,,
2,address,b'varchar(50)',NO,,,
3,zip code,b'varchar(10)',YES,,,
4,phone,b'varchar(20)',NO,,,
5,city,b'varchar(50)',NO,,,
6,country,b'varchar(50)',NO,,,
7,notes,b'varchar(6)',NO,,b'',
8,SID,b'tinyint unsigned',NO,,,


The ID of the customer uniquely identifies a customer. So we can use these tables
to practice.

## 1. INNER JOIN

The syntax for the INNER JOIN in SQL is:
```
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
```
*Note*: When SELECTing the common columns, have to clearly assign a table name. If column names or table names are too long, we can use aliases to give them short names.

In [24]:
pd.read_sql_query("""
    SELECT 
        c.*,
        cl.*
    FROM sakila.customer c
    INNER JOIN sakila.customer_list cl
        ON c.customer_id=cl.ID
    LIMIT 5
    """,
    connection)

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update,ID,name,address,zip code,phone,city,country,notes,SID
0,218,1,VERA,MCCOY,VERA.MCCOY@sakilacustomer.org,222,1,2006-02-14 22:04:36,2006-02-15 04:57:20,218,VERA MCCOY,1168 Najafabad Parkway,40301,886649065861,Kabul,Afghanistan,active,1
1,441,1,MARIO,CHEATHAM,MARIO.CHEATHAM@sakilacustomer.org,446,1,2006-02-14 22:04:37,2006-02-15 04:57:20,441,MARIO CHEATHAM,1924 Shimonoseki Drive,52625,406784385440,Batna,Algeria,active,1
2,69,2,JUDY,GRAY,JUDY.GRAY@sakilacustomer.org,73,1,2006-02-14 22:04:36,2006-02-15 04:57:20,69,JUDY GRAY,1031 Daugavpils Parkway,59025,107137400143,Bchar,Algeria,active,2
3,176,1,JUNE,CARROLL,JUNE.CARROLL@sakilacustomer.org,180,1,2006-02-14 22:04:36,2006-02-15 04:57:20,176,JUNE CARROLL,757 Rustenburg Avenue,89668,506134035434,Skikda,Algeria,active,1
4,320,2,ANTHONY,SCHWAB,ANTHONY.SCHWAB@sakilacustomer.org,325,1,2006-02-14 22:04:37,2006-02-15 04:57:20,320,ANTHONY SCHWAB,1892 Nabereznyje Telny Lane,28396,478229987054,Tafuna,American Samoa,active,2


## 2. LEFT JOIN

Similar to the INNER JOIN clause, the LEFT JOIN clause is an optional clause of the SELECT statement. You use the LEFT JOIN clause to query data from multiple correlated tables. This type of join returns all rows from the LEFT-hand table specified in the ON condition and only those rows from the other table where the joined fields are equal (join condition is met).

The syntax for the SQL LEFT OUTER JOIN is:
```
SELECT columns
FROM table1
LEFT [OUTER] JOIN table2
ON table1.column = table2.column
```

In [28]:
pd.read_sql_query("""
    SELECT 
        c.*,
        cl.*
    FROM sakila.customer c
    LEFT JOIN sakila.customer_list cl
        ON c.customer_id=cl.ID
    LIMIT 5
    """,
    connection)

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update,ID,name,address,zip code,phone,city,country,notes,SID
0,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36,2006-02-15 04:57:20,1,MARY SMITH,1913 Hanoi Way,35200,28303384290,Sasebo,Japan,active,1
1,2,1,PATRICIA,JOHNSON,PATRICIA.JOHNSON@sakilacustomer.org,6,1,2006-02-14 22:04:36,2006-02-15 04:57:20,2,PATRICIA JOHNSON,1121 Loja Avenue,17886,838635286649,San Bernardino,United States,active,1
2,3,1,LINDA,WILLIAMS,LINDA.WILLIAMS@sakilacustomer.org,7,1,2006-02-14 22:04:36,2006-02-15 04:57:20,3,LINDA WILLIAMS,692 Joliet Street,83579,448477190408,Athenai,Greece,active,1
3,4,2,BARBARA,JONES,BARBARA.JONES@sakilacustomer.org,8,1,2006-02-14 22:04:36,2006-02-15 04:57:20,4,BARBARA JONES,1566 Inegl Manor,53561,705814003527,Myingyan,Myanmar,active,2
4,5,1,ELIZABETH,BROWN,ELIZABETH.BROWN@sakilacustomer.org,9,1,2006-02-14 22:04:36,2006-02-15 04:57:20,5,ELIZABETH BROWN,53 Idfu Parkway,42399,10655648674,Nantou,Taiwan,active,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
594,595,1,TERRENCE,GUNDERSON,TERRENCE.GUNDERSON@sakilacustomer.org,601,1,2006-02-14 22:04:37,2006-02-15 04:57:20,595,TERRENCE GUNDERSON,844 Bucuresti Place,36603,935952366111,Jinzhou,China,active,1
595,596,1,ENRIQUE,FORSYTHE,ENRIQUE.FORSYTHE@sakilacustomer.org,602,1,2006-02-14 22:04:37,2006-02-15 04:57:20,596,ENRIQUE FORSYTHE,1101 Bucuresti Boulevard,97661,199514580428,Patras,Greece,active,1
596,597,1,FREDDIE,DUGGAN,FREDDIE.DUGGAN@sakilacustomer.org,603,1,2006-02-14 22:04:37,2006-02-15 04:57:20,597,FREDDIE DUGGAN,1103 Quilmes Boulevard,52137,644021380889,Sullana,Peru,active,1
597,598,1,WADE,DELVALLE,WADE.DELVALLE@sakilacustomer.org,604,1,2006-02-14 22:04:37,2006-02-15 04:57:20,598,WADE DELVALLE,1331 Usak Boulevard,61960,145308717464,Lausanne,Switzerland,active,1


## 3. Cross Join
Another type of join is called a SQL CROSS JOIN. This type of join returns a combined result set with every row from the first table matched with every row from the second table. This is also called a Cartesian Product.

The syntax for the CROSS JOIN is:
```
SELECT columns
FROM table1
CROSS JOIN table2
``` 

It does not make sense a cross join in our example, but let's do it the same.

In [27]:
pd.read_sql_query("""
    SELECT
         c.*
        ,cl.*
    FROM sakila.customer c
    CROSS JOIN sakila.customer_list cl
    """,
    connection)

Unnamed: 0,customer_id,store_id,first_name,last_name,email,address_id,active,create_date,last_update,ID,name,address,zip code,phone,city,country,notes,SID
0,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36,2006-02-15 04:57:20,438,BARRY LOVELACE,1836 Korla Parkway,55405,689681677428,Kitwe,Zambia,active,1
1,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36,2006-02-15 04:57:20,553,MAX PITT,1917 Kumbakonam Parkway,11892,698182547686,Novi Sad,Yugoslavia,active,1
2,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36,2006-02-15 04:57:20,7,MARIA MILLER,900 Santiago de Compostela Parkway,93896,716571220373,Kragujevac,Yugoslavia,active,1
3,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36,2006-02-15 04:57:20,213,GINA WILLIAMSON,1001 Miyakonojo Lane,67924,584316724815,Taizz,Yemen,active,1
4,1,1,MARY,SMITH,MARY.SMITH@sakilacustomer.org,5,1,2006-02-14 22:04:36,2006-02-15 04:57:20,303,WILLIAM SATTERFIELD,687 Alessandria Parkway,57587,407218522294,Sanaa,Yemen,active,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
358796,599,2,AUSTIN,CINTRON,AUSTIN.CINTRON@sakilacustomer.org,605,1,2006-02-14 22:04:37,2006-02-15 04:57:20,320,ANTHONY SCHWAB,1892 Nabereznyje Telny Lane,28396,478229987054,Tafuna,American Samoa,active,2
358797,599,2,AUSTIN,CINTRON,AUSTIN.CINTRON@sakilacustomer.org,605,1,2006-02-14 22:04:37,2006-02-15 04:57:20,176,JUNE CARROLL,757 Rustenburg Avenue,89668,506134035434,Skikda,Algeria,active,1
358798,599,2,AUSTIN,CINTRON,AUSTIN.CINTRON@sakilacustomer.org,605,1,2006-02-14 22:04:37,2006-02-15 04:57:20,69,JUDY GRAY,1031 Daugavpils Parkway,59025,107137400143,Bchar,Algeria,active,2
358799,599,2,AUSTIN,CINTRON,AUSTIN.CINTRON@sakilacustomer.org,605,1,2006-02-14 22:04:37,2006-02-15 04:57:20,441,MARIO CHEATHAM,1924 Shimonoseki Drive,52625,406784385440,Batna,Algeria,active,1


## 4. Querying multiple tables using joins
Relational databases can be fairly complex in terms of relationships between tables. Sometimes, we have to require information from more than two tables.

We can use the following syntax to join multiple tables:

```
SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column
INNER JOIN table3 ON table1.column = table3.column
...
INNER JOIN tablen ON table1.column = tablen.column;
```

There is no limit of maximum number of tables you can join according to SQL itself. However, most DBMSes have their own limits. You should check your DBMSes docs in practical applications. In addition, the query will will slow down considerably when joining too many tables (e.g., 4 or more tables).

## Summary
In this notebook, we practices the three major join types in SQL: INNER, LEFT and CROSS joins. Joins allow us to take data scattered across multiple tables and stitch it together into something more meaningful and descriptive. We can take two or more tables and join them together into a larger table that has more context. Moreover, using aliases enables us to rename column or table names on the fly.

# References
- [Chonghua Yin notebook](https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook/blob/master/ex08-Joining%20Tables.ipynb)