<h1>Table of Contents<span class="tocSkip"></span></h1>
<span><a href="#Joins" data-toc-modified-id="Joins-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Joins</a></span><ul class="toc-item"><li><span><a href="#INNER-JOIN" data-toc-modified-id="INNER-JOIN-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span><code>INNER JOIN</code></a></span><ul class="toc-item"><li><span><a href="#Code-Example-for-Inner-Joins" data-toc-modified-id="Code-Example-for-Inner-Joins-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Code Example for Inner Joins</a></span><ul class="toc-item"><li><span><a href="#Inner-Join-Routes-&amp;-Airline-Data" data-toc-modified-id="Inner-Join-Routes-&amp;-Airline-Data-1.1.1.1"><span class="toc-item-num">1.1.1.1&nbsp;&nbsp;</span>Inner Join Routes &amp; Airline Data</a></span></li><li><span><a href="#Note:-Losing-Data-with-Inner-Joins" data-toc-modified-id="Note:-Losing-Data-with-Inner-Joins-1.1.1.2"><span class="toc-item-num">1.1.1.2&nbsp;&nbsp;</span>Note: Losing Data with Inner Joins</a></span></li></ul></li></ul></li><li><span><a href="#LEFT-JOIN" data-toc-modified-id="LEFT-JOIN-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span><code>LEFT JOIN</code></a></span><ul class="toc-item"><li><span><a href="#Code-Example-for-Left-Join" data-toc-modified-id="Code-Example-for-Left-Join-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Code Example for Left Join</a></span></li></ul></li><li><span><a href="#Exercise:-Joins" data-toc-modified-id="Exercise:-Joins-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Exercise: Joins</a></span><ul class="toc-item"><li><span><a href="#Possible-Solution" data-toc-modified-id="Possible-Solution-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Possible Solution</a></span></li></ul></li></ul></li>

![sql](img/sql-logo.jpg)

# Joins

The biggest advantage in using a relational database (like we've been with SQL) is that you can create **joins**.

> By using **`JOIN`** in our query, we can connect different tables using their _relationships_ to other tables.
>
> Usually we use a key (_foriegn_key_) to tell us how the two tables are related.

There are different types of joins and each has their different use case.

## Using JOIN
SQL Joins clause is used to combine records from two or more tables in a database. A JOIN is a means for combining fields from two tables by using values common to each.
```SELECT column-names
  FROM table-name1 JOIN table-name2 
    ON column-name1 = column-name2
 WHERE condition``` 
 
 ```SELECT column-names
  FROM table-name1 INNER JOIN table-name2 
    ON column-name1 = column-name2
 WHERE condition```
 
### Types of JOINS
- INNER JOIN − returns rows when there is a match in both tables.

- LEFT JOIN − returns all rows from the left table, even if there are no matches in the right table.

- RIGHT JOIN − returns all rows from the right table, even if there are no matches in the left table.

- FULL JOIN − returns rows when there is a match in one of the tables.
![](https://www.dofactory.com/img/sql/sql-joins.png)

_The difference between inner and full join is Inner join returns only the matching rows between both the tables, non-matching rows are eliminated. Full Join or Full Outer Join returns all rows from both the tables (left & right tables), including non-matching rows from both the tables_

## `INNER JOIN`

> An **inner join** will join two tables together and only keep rows if the _key is in both tables_

![](img/inner_join.png)

Example of an inner join:

```sql
SELECT
    table1.column_name,
    table2.different_column_name
FROM
    table1
    INNER JOIN table2
        ON table1.shared_column_name = table2.shared_column_name
```

### Code Example for Inner Joins: Continuing with `flights.db`

Let's say we want to look at the different airplane routes

In [None]:
import sqlite3
import pandas as pd

In [None]:
conn = sqlite3.connect('flights.db')

In [None]:
cursor = conn.cursor()

In [None]:
pd.read_sql('''
    SELECT 
        *
    FROM
        routes 
''', conn)

This is great but notice `airline_id`. It'd be nice to have some information about the airline for that route.

In [None]:
pd.read_sql('''
SELECT * 
FROM airlines
''',conn)

We can do an **inner join** to get this information!

#### Inner Join Routes & Airline Data

In [None]:
pd.read_sql('''
    SELECT 
        *
    FROM
        routes
        INNER JOIN airlines
            ON routes.airline_id = airlines.id
''', conn)

We can also specify to only retain certain columns in the `SELECT` clause:

In [None]:
pd.read_sql('''
    SELECT 
        routes.source AS departing
        ,routes.dest AS destination
        ,routes.stops AS stops_before_destination
        ,airlines.name AS airline
    FROM
        routes
        INNER JOIN airlines
            ON routes.airline_id = airlines.id
''', conn)

#### Note: Losing Data with Inner Joins

Since data rows are kept if _both_ tables have the key, some data can be lost

In [None]:
df_all_routes = pd.read_sql('''
    SELECT 
        *
    FROM
        routes
''', conn)

df_routes_after_join = pd.read_sql('''
    SELECT 
        *
    FROM
        routes
        INNER JOIN airlines
            ON routes.airline_id = airlines.id
''', conn)

In [None]:
# Look at how the number of rows are different
df_all_routes.shape, df_routes_after_join.shape 

If you want to keep your data from at least one of your tables, you should use a left join instead of an inner join.

## `LEFT JOIN`

> A **left join** will join two tables together and but will keep all data from the first (left) table using the key provided.

![](img/left_join.png)

Example of a left and right join:

```sql
SELECT
    table1.column_name,
    table2.different_column_name
FROM
    table1
    LEFT JOIN table2
        ON table1.shared_column_name = table2.shared_column_name
```

### Code Example for Left Join

Recall our example using an inner join and how it lost some data since the key wasn't in both the `routes` _and_ `airlines` tables. 

In [None]:
df_all_routes = pd.read_sql('''
    SELECT 
        *
    FROM
        routes
''', conn)

# This will lose some data (some routes not included)
df_routes_after_inner_join = pd.read_sql('''
    SELECT 
        *
    FROM
        routes
        INNER JOIN airlines
            ON routes.airline_id = airlines.id
''', conn)

# The number of rows are different
df_all_routes.shape, df_routes_after_inner_join.shape

If wanted to ensure we always had every route even if the key in `airlines` was not found, we could replace our `INNER JOIN` with a `LEFT JOIN`:

In [None]:
# This will include all the data from routes
df_routes_after_left_join = pd.read_sql('''
    SELECT 
        *
    FROM
        routes
        LEFT JOIN airlines
            ON routes.airline_id = airlines.id
''', conn)

df_routes_after_left_join.shape

## Exercise: Joins

Which airline has the most routes listed in our database?

In [None]:
# Your code here


### Possible solution

```sql 
SELECT
    airlines.name AS airline,
    COUNT() AS number_of_routes
-- We first need to get all the relevant info via a join
FROM
    routes
    -- LEFT JOIN since we want all routes (even if airline id is unknown)
    LEFT JOIN airlines
        ON routes.airline_id = airlines.id
-- We need to group by airline's ID
GROUP BY
    airlines.id
ORDER BY
    number_of_routes DESC
```

# Let's try some more joins with another database.

In [None]:
conn = sqlite3.Connection('pokedex.db')
cur = conn.cursor()

### Lets explore the schema first so we know what our tables and their data are.

In [None]:
pd.read_sql('''
SELECT *
FROM sqlite_master''',conn)['sql']

### Create a table of move names, type, and type ID

In [None]:
df = pd.read_sql('''SELECT move, identifier, id
                FROM learned_moves
                JOIN types
                ON type_id=id''',conn)

df.head()

### Find the two Pokemon types with the least weaknesses

In [None]:
df =pd.read_sql('''
            SELECT identifier as type, COUNT(attacking_type) AS num_weaknesses
            FROM weaknesses
            JOIN types
            ON defending_type=id
            WHERE damage_factor=200
            GROUP BY defending_type
            ORDER BY num_weaknesses
            
            ''',conn)
df

### Find the top 5 Pokemon having the highest variety of move types.

In [None]:
pd.read_sql('''
            SELECT name, COUNT(DISTINCT type_id) AS num_move_types
            FROM pokemon
            JOIN learned_moves
            ON id=pokemon_id
            GROUP BY name
            ORDER BY num_move_types DESC
            LIMIT 5
            ''',conn)

df

### Get the names of all Pokemon who learn a super effective move against Water type Pokemon. Also include the name of one of these moves the Pokemon learns.

First, try using a subquery to get types super effective against water.

In [None]:
df = pd.read_sql('''
            SELECT identifier as type
            FROM types
            JOIN weaknesses
            ON attacking_type=id
            WHERE defending_type IN (SELECT id FROM types WHERE identifier="water")
            AND damage_factor=200
            ''',conn)
df.head()

In [None]:
pd.read_sql('''
            SELECT name, move
            FROM Pokemon p
            JOIN learned_moves m 
            ON p.id = pokemon_id
            JOIN types t
            ON type_id = t.id
            JOIN weaknesses
            ON attacking_type=t.id
            WHERE defending_type IN (SELECT id FROM types WHERE identifier="water")
            AND damage_factor=200
            GROUP BY name
            ''',conn)

df.head()