<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Hands on with SQL

### Download the flights dataset

https://www.dropbox.com/s/a2wax843eniq12g/flights.db?dl=1

This is a dataset of airlines, airports, and routes.

This is in SQLite format; a lightweight SQL library, where your entire database is a single file.

We can interact with SQLite using Python!

In [None]:
import sqlite3

We will also use `pandas` to run our SQL queries and display results. More on `pandas` later...

In [None]:
import pandas as pd

Open your database

In [None]:
db = sqlite3.connect('assets/db/flights.db')

The general form of a SQL query is:

```sql

SELECT
    <column(s)>
FROM
    <table(s)>
WHERE
    <condition(s)>
    
```

Unlike Python, indentation does **not** matter, it's just for readability.

- to select all columns, use `SELECT *`
- to select all rows, omit the `WHERE` part

Let's select all rows and columns from the airports table:

In [None]:
pd.read_sql("""
    SELECT
        *
    FROM
        airports
    """, con=db).head(10)

Now only a few columns:

In [None]:
pd.read_sql("""
    SELECT
        name,
        city,
        code
    FROM
        airports
    """, con=db).head(10)

Or filter some rows.

`WHERE` clauses can contain boolean combinations like `AND` and `OR`

In [None]:
pd.read_sql("""
    SELECT
        name
    FROM
        airports
    WHERE
        timezone = 'Europe/Madrid'
    """, con=db).head(10)

You can change the ordering using `ORDER BY`.

You can order ascending with `ORDER BY ASC` (the default) or `ORDER BY DESC`.

In [None]:
pd.read_sql("""
    SELECT
        name
    FROM
        airports
    ORDER BY
        name
    """, con=db).head(10)

## Aggregation

You can **summarise** the dataset with functions like:

- count
- sum
- avg
- min/max

depending on the data type of the column (you can't "average" text!)

In [None]:
pd.read_sql("""
    SELECT
        AVG(latitude),
        AVG(longitude)
    FROM
        airports
    """, con=db)

To calculate these metrics for smaller groups, you can **group** rows that are share a value in one or more columns.

In [None]:
pd.read_sql("""
    SELECT
        timezone,
        COUNT(name)
    FROM
        airports
    GROUP BY
        timezone
    """, con=db).head(10)

# Joining

First let's look at airline **routes**. We can select just a few rows with `LIMIT`:

In [None]:
pd.read_sql("""
    SELECT
        *
    FROM
        routes
    LIMIT
        10
    """, con=db)

`source_id` and `dest_id` both relate to the `airports` table

What kind of keys are they?

Now what if we wanted to find out which airports those IDs relate to?

To join tables together, you need to specify which columns match.

```sql

SELECT
    <column(s)>
FROM
    <table_1>
    JOIN <table_2> ON <condition(s)>

```

In [None]:
pd.read_sql("""
    SELECT
        routes.[index],
        routes.source_id,
        airports.name AS source_airport
    FROM
        routes
        JOIN airports ON routes.source_id = airports.id
""", con=db).head(10)

It's a good idea to put the table name before each column when you join, so you know what came from where.

Notice you can rename columns in the output with `AS`.

You can do multiple `JOIN`s and rename tables with `AS`:

In [None]:
pd.read_sql("""
    SELECT
        routes.[index],
        routes.source_id,
        origin.name AS source_airport,
        routes.dest_id,
        destination.name AS destination_airport
    FROM
        routes
        JOIN airports AS origin ON routes.source_id = origin.id
        JOIN airports AS destination ON routes.dest_id = destination.id
""", con=db).head(10)

# Different Types of SQL JOINs

Here are the different types of the JOINs in SQL:

    (INNER) JOIN: Returns records that have matching values in both tables
    LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the right table
    RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the left table
    FULL (OUTER) JOIN: Return all records when there is a match in either left or right table


![](assets/db/joins.png)