# SQL Introduction
## Setting up the environment
In order to follow this exercise, you'll need SQLite3 installed.  Binaries for Windows, Linux, and Mac can all be obtained on their [webpage](https://www.sqlite.org/download.html).  

If you're on windows, be sure to get both the precompiled binaries for your OS as well as the `sqlite-tools`.

Once you have it installed, type `sqlite3` into your OS console.  If you see a prompt saying `sqlite>`, all is in order and you're ready to proceed.

## Basics
SQL, often pronounced "see-quell", stands for Structured Query Langauge and is the language used to query relational databases.  Although you may have never actually worked with a relational database, you are probably already very familiar with their structure and concepts, likely through working with spreadsheet programs like Excel.  To review, here are some of the basic ideas.
- Data are organized into *tables*, like spreadsheets using the Excel analogy.
- *Tables* consist of named *columns* and zero or more *rows*
- *Columns* contain a data type (integer, float, string, etc...)
- *Rows* may be indexed by a *primary key* which is a column, or a set of columns, used to uniquely identify each row.

## Import some data
Before getting into writing queries, we first need to load some data.  We're going to use the `mtcars` dataset that is often used in R and Python examples.  If you need a copy of the .csv, it can be downloaded [here](https://vincentarelbundock.github.io/Rdatasets/datasets.html), and it may be a good idea to familiarize yourself with the data a bit before proceeding.  Also, be sure that all columns have a name.  Once you have the data, put it in the same directory as this notebook.  Now open up a command prompt in your OS, navigate to the directory containing the data, and run the command `sqlite3`.  You should now see a `sqlite3>` indicating that the server is running and ready for commands.  Now enter the following into the `sqlite3` console.

```
.mode csv
.import mtcars.csv mtcars
```

Note that this is **not** SQL.  This is just a SQLite command used to import data. The `.mode` command tells SQLite to prepare the read in a .csv, and the `.import mtcars.csv mtcars` tells it to create a table called `mtcars` from the file named `mtcars.csv`.  For a full list of SQLite commands, consult the documentation [here](https://www.sqlite.org/cli.html).

## `SELECT`
The first SQL command we're going to work with is `SELECT`.  You'll find pretty quickly that SQL really reads almost like an English sentence.  So intuitively, `SELECT` may specify the columns you want to select.  Give it a try by entering the following into the `sqlite3` console.

```SQL
SELECT * FROM mtcars LIMIT 10;
```

```
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
"Merc 280",19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
```

The `*` is the wildcard character and in this context it can be read as "all".  The `SELECT` clause is followed by `FROM` which specifies which table the preceding columns will be selected from.  Anything that comes after the `FROM` works to operate or "filter" on the resultant rows, in our case, limiting the number of rows returned.  So with that in mind, the above SQL statement reads "select all (columns) from (table) mtcars limit(ed to) 10 (rows)".  Also note that the `SQL` statement ended with a `;` which means "end of command".  Some `SQL` software require its use while others do not.  When using the `sqlite3` console, we must always terminate our statements with a `;`.

However, it would be nice if we could see the column headings.  To do so with SQLite, we need to issue another command.

```
.mode columns
.headers on
```

And rerun the query.

```SQL
SELECT * FROM mtcars LIMIT 10;
```

```
name        mpg         cyl         disp        hp          drat        wt          qsec        vs          am          gear        carb      
----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Mazda RX4   21          6           160         110         3.9         2.62        16.46       0           1           4           4         
Mazda RX4   21          6           160         110         3.9         2.875       17.02       0           1           4           4         
Datsun 710  22.8        4           108         93          3.85        2.32        18.61       1           1           4           1         
Hornet 4 D  21.4        6           258         110         3.08        3.215       19.44       1           0           3           1         
Hornet Spo  18.7        8           360         175         3.15        3.44        17.02       0           0           3           2         
Valiant     18.1        6           225         105         2.76        3.46        20.22       1           0           3           1         
Duster 360  14.3        8           360         245         3.21        3.57        15.84       0           0           3           4         
Merc 240D   24.4        4           146.7       62          3.69        3.19        20          1           0           4           2         
Merc 230    22.8        4           140.8       95          3.92        3.15        22.9        1           0           4           2         
Merc 280    19.2        6           167.6       123         3.92        3.44        18.3        1           0           4           4  
```

Another useful command is `ORDER BY`.  You may intuitively already know what this means, but let's try it out.

```SQL
SELECT * FROM mtcars ORDER BY mpg DESC LIMIT 10;
```

```
name            mpg         cyl         disp        hp          drat        wt          qsec        vs          am          gear        carb      
--------------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Toyota Corolla  33.9        4           71.1        65          4.22        1.835       19.9        1           1           4           1         
Fiat 128        32.4        4           78.7        66          4.08        2.2         19.47       1           1           4           1         
Honda Civic     30.4        4           75.7        52          4.93        1.615       18.52       1           1           4           2         
Lotus Europa    30.4        4           95.1        113         3.77        1.513       16.9        1           1           5           2         
Fiat X1-9       27.3        4           79          66          4.08        1.935       18.9        1           1           4           1         
Porsche 914-2   26          4           120.3       91          4.43        2.14        16.7        0           1           5           2         
Merc 240D       24.4        4           146.7       62          3.69        3.19        20          1           0           4           2         
Datsun 710      22.8        4           108         93          3.85        2.32        18.61       1           1           4           1         
Merc 230        22.8        4           140.8       95          3.92        3.15        22.9        1           0           4           2         
Toyota Corona   21.5        4           120.1       97          3.7         2.465       20.01       1           0           3           1    
```

Notice that for `ORDER BY` we had to specify the column we wanted to sort on as well as `DESC` which stands for decending since the default in SQLite is ascending order.  Often we are not interested in all the columns in the data, so we just need to specify to `SELECT` which ones we are interested in.

```SQL
SELECT name, cyl, hp, wt, mpg FROM mtcars ORDER BY mpg DESC LIMIT 10;
```

```
name            cyl         hp          wt          mpg       
--------------  ----------  ----------  ----------  ----------
Toyota Corolla  4           65          1.835       33.9      
Fiat 128        4           66          2.2         32.4      
Honda Civic     4           52          1.615       30.4      
Lotus Europa    4           113         1.513       30.4      
Fiat X1-9       4           66          1.935       27.3      
Porsche 914-2   4           91          2.14        26        
Merc 240D       4           62          3.19        24.4      
Datsun 710      4           93          2.32        22.8      
Merc 230        4           95          3.15        22.8      
Toyota Corona   4           97          2.465       21.5
```

## `WHERE`
When querying data you are almost always going to be just interested in a particular subset which is exactly what `WHERE` does.  You can think of `WHERE` as a filter applied to the rows after the `SELECT` clause.  Here's an example that only returns rows with `cyl = 6`.  Notice that here we will break our query accross several lines.  This can be still copy and pasted into the `sqlite3` console as long as the statement is terminated with a `;`.

```SQL
SELECT name, cyl, hp, wt, mpg 
FROM mtcars 
WHERE cyl = 6 
ORDER BY mpg DESC 
LIMIT 10;
```

```
name            cyl         hp          wt          mpg       
--------------  ----------  ----------  ----------  ----------
Hornet 4 Drive  6           110         3.215       21.4      
Mazda RX4       6           110         2.62        21        
Mazda RX4 Wag   6           110         2.875       21        
Ferrari Dino    6           175         2.77        19.7      
Merc 280        6           123         3.44        19.2      
Valiant         6           105         3.46        18.1      
Merc 280C       6           123         3.44        17.8
```

The where clause uses most of the conditional operators you are familiar with (`>`, `<`, `!=`, etc...)  For a full list, consult the [documentation](https://www.sqlite.org/lang_select.html#whereclause).  We can also apply multiple filters in the same `WHERE` clause.  For example

```SQL
SELECT name, cyl, hp, wt, mpg 
FROM mtcars 
WHERE cyl = 6 AND mpg >= 20
ORDER BY mpg DESC 
LIMIT 10;
```

```
name            cyl         hp          wt          mpg       
--------------  ----------  ----------  ----------  ----------
Hornet 4 Drive  6           110         3.215       21.4      
Mazda RX4       6           110         2.62        21        
Mazda RX4 Wag   6           110         2.875       21    
```

will return only rows with 6 cylinders and greater than or equal to 20 `mpg`.  Note that SQLite does not use the `&` or `|` operators you may be familiar with coming from a programming background.  Another useful keyword, especially for data science, is `CASE` which can be used to code certain classes of rows.

```SQL
SELECT name, cyl, hp, wt, mpg,
    CASE WHEN mpg < 15 THEN 'low'
         WHEN mpg < 20 THEN 'med'
         WHEN mpg > 20 THEN 'high'
    END
         AS class
FROM mtcars  
LIMIT 10;
```

```
name        cyl         hp          wt          mpg         class     
----------  ----------  ----------  ----------  ----------  ----------
Mazda RX4   6           110         2.62        21          high      
Mazda RX4   6           110         2.875       21          high      
Datsun 710  4           93          2.32        22.8        high      
Hornet 4 D  6           110         3.215       21.4        high      
Hornet Spo  8           175         3.44        18.7        med       
Valiant     6           105         3.46        18.1        med       
Duster 360  8           245         3.57        14.3        low       
Merc 240D   4           62          3.19        24.4        high      
Merc 230    4           95          3.15        22.8        high      
Merc 280    6           123         3.44        19.2        med     
```

Note that `AS` is used to name objects in SQL.  For example

```SQL
SELECT name AS car, cyl, hp, wt, mpg,
    CASE WHEN mpg < 15 THEN 'low'
         WHEN mpg < 20 THEN 'med'
         WHEN mpg > 20 THEN 'high'
    END
         AS class
FROM mtcars  
LIMIT 10;
```

```
car         cyl         hp          wt          mpg         class     
----------  ----------  ----------  ----------  ----------  ----------
Mazda RX4   6           110         2.62        21          high      
Mazda RX4   6           110         2.875       21          high      
Datsun 710  4           93          2.32        22.8        high      
Hornet 4 D  6           110         3.215       21.4        high      
Hornet Spo  8           175         3.44        18.7        med       
Valiant     6           105         3.46        18.1        med       
Duster 360  8           245         3.57        14.3        low       
Merc 240D   4           62          3.19        24.4        high      
Merc 230    4           95          3.15        22.8        high      
Merc 280    6           123         3.44        19.2        med      
```

renames the `name` column as `car`.

## `GROUP BY` and Aggregate Functions

Before moving forward, it may be helpful to take a look at the SQL [flow diagram](https://www.sqlite.org/lang_select.html) in the SQLite Documentation.  It is probably not too useful to spend a lot of time with it, but having it in mind as we move into `GROUP BY` and Aggregate Functions will be helpful.

`GROUP BY` and Aggregate Functions go hand in hand.  Aggregate Functions are essentially functions that you can perform on columns in the data.  The function `avg()` for example

```SQL
SELECT avg(mpg) FROM mtcars;
```

```
avg(mpg)  
----------
20.090625
```

returns the average mpg for all cars in the data.  What if we were to change the query a bit and instead said

```SQL
SELECT name, cyl, hp, wt, avg(mpg) FROM mtcars;
```

```
name        cyl         hp          wt          avg(mpg)  
----------  ----------  ----------  ----------  ----------
Volvo 142E  4           109         2.78        20.090625 
```

One row is returned with the average mpg inserted into the mpg column.  But why that particular row?  The answer to this is honestly not particularly useful, but keep in mind that aggregate functions are almost always intended to be used with a `GROUP BY` clause, otherwise, you get nonsense like the above back.  Let's instead use a `GROUP BY` clause to return something more informative.

```SQL
SELECT cyl,
    avg(hp) as avg_hp,
    avg(wt) as avg_wt,
    avg(mpg) as avg_mpg
FROM mtcars
GROUP BY cyl;
```

```
cyl         avg_hp            avg_wt            avg_mpg         
----------  ----------------  ----------------  ----------------
4           82.6363636363636  2.28572727272727  26.6636363636364
6           122.285714285714  3.11714285714286  19.7428571428571
8           209.214285714286  3.99921428571429  15.1          
```

Now this is much more useful.  What was returned was the average value of the specified columns for each unique value in the `GROUP BY` clause, in this case, the cylinders.  It is **very** important to keep in mind however, that `GROUP BY` will group by each *unique* value in the columns.  This isn't a problem here since there are only 3 unique values of `cyl`, but what if we had, for example, grouped by `wt` or some other column that had many more unique values?  Well, `GROUP BY` will return a table with a number of rows equal to the number of unique values.  On small datasets, this isn't too much of a problem, but when working with a very large database, `GROUP BY` and aggregate function calls can be a bit taxing and can really slow things down, or potentially lock up your client (and irrate your DBA) if you aren't careful.

We can also `GROUP BY` multiple columns.

```SQL
SELECT cyl,
    avg(hp) as avg_hp,
    avg(wt) as avg_wt,
    avg(mpg) as avg_mpg,
    CASE WHEN mpg < 15 THEN 'low'
         WHEN mpg < 20 THEN 'med'
         WHEN mpg > 20 THEN 'high'
    END
         AS class
FROM mtcars
GROUP BY cyl, class;
```

```
cyl         avg_hp            avg_wt            avg_mpg           class     
----------  ----------------  ----------------  ----------------  ----------
4           82.6363636363636  2.28572727272727  26.6636363636364  high      
6           110.0             2.90333333333333  21.1333333333333  high      
6           131.5             3.2775            18.7              med       
8           228.0             4.6858            12.62             low       
8           198.777777777778  3.61777777777778  16.4777777777778  med  
```

Another useful keyword is `LIKE` which is used in a manner similar to regular expressions.

```SQL
SELECT name,
    cyl,
    avg(hp) as avg_hp,
    avg(wt) as avg_wt,
    avg(mpg) as avg_mpg,
    CASE WHEN mpg < 15 THEN 'low'
         WHEN mpg < 20 THEN 'med'
         WHEN mpg > 20 THEN 'high'
    END
         AS class
FROM mtcars
WHERE name LIKE 'Merc%'
GROUP BY cyl, class;
```

```
name        cyl         avg_hp      avg_wt      avg_mpg     class     
----------  ----------  ----------  ----------  ----------  ----------
Merc 230    4           78.5        3.17        23.6        high      
Merc 280C   6           123.0       3.44        18.5        med       
Merc 450SL  8           180.0       3.86        16.3        med   
```

Which returns only the averages of Mercedes cars grouped by class and cylinders.  Before experimenting a bit more, let's add the class column to the table permanently.  To do that, we must use `ALTER TABLE` and `ADD`.  But what if we only want to return `cyl` and a count of the cars in each group only if the group has an average hoursepower of at least 120?  We could try

```SQL
ALTER TABLE mtcars
ADD class char;
```

The `ADD` statement is of the form `ADD col_name data_type`.  In our case, we want the column to be of type `char`.  And now let's take a look at the table

```SQL
SELECT * FROM mtcars LIMIT 10;
```

```
name        mpg         cyl         disp        hp          drat        wt          qsec        vs          am          gear        carb        class     
----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Mazda RX4   21          6           160         110         3.9         2.62        16.46       0           1           4           4                     
Mazda RX4   21          6           160         110         3.9         2.875       17.02       0           1           4           4                     
Datsun 710  22.8        4           108         93          3.85        2.32        18.61       1           1           4           1                     
Hornet 4 D  21.4        6           258         110         3.08        3.215       19.44       1           0           3           1                     
Hornet Spo  18.7        8           360         175         3.15        3.44        17.02       0           0           3           2                     
Valiant     18.1        6           225         105         2.76        3.46        20.22       1           0           3           1                     
Duster 360  14.3        8           360         245         3.21        3.57        15.84       0           0           3           4                     
Merc 240D   24.4        4           146.7       62          3.69        3.19        20          1           0           4           2                     
Merc 230    22.8        4           140.8       95          3.92        3.15        22.9        1           0           4           2                     
Merc 280    19.2        6           167.6       123         3.92        3.44        18.3        1           0           4           4       
```

Notice how the new column is empty.  To put the data in it, we need to use `UPDATE` and `SET`.

```SQL
UPDATE mtcars
SET class = CASE WHEN mpg < 15 THEN 'low'
                 WHEN mpg < 20 THEN 'med'
                 WHEN mpg > 20 THEN 'high'
            END;
```

Again let's take a look.

```SQL
SELECT * FROM mtcars LIMIT 10;
```

```
name        mpg         cyl         disp        hp          drat        wt          qsec        vs          am          gear        carb        class     
----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Mazda RX4   21          6           160         110         3.9         2.62        16.46       0           1           4           4           high      
Mazda RX4   21          6           160         110         3.9         2.875       17.02       0           1           4           4           high      
Datsun 710  22.8        4           108         93          3.85        2.32        18.61       1           1           4           1           high      
Hornet 4 D  21.4        6           258         110         3.08        3.215       19.44       1           0           3           1           high      
Hornet Spo  18.7        8           360         175         3.15        3.44        17.02       0           0           3           2           med       
Valiant     18.1        6           225         105         2.76        3.46        20.22       1           0           3           1           med       
Duster 360  14.3        8           360         245         3.21        3.57        15.84       0           0           3           4           low       
Merc 240D   24.4        4           146.7       62          3.69        3.19        20          1           0           4           2           high      
Merc 230    22.8        4           140.8       95          3.92        3.15        22.9        1           0           4           2           high      
Merc 280    19.2        6           167.6       123         3.92        3.44        18.3        1           0           4           4           med 
```

Now all appears in order.  Let's suppose that we only want to return `class` and a count of the cars in each group only if the group has an average hoursepower of at least 120?  We could try

```SQL
SELECT class, count(name)
FROM mtcars
WHERE avg(hp) >= 120
GROUP BY class;
```

```
Error: misuse of aggregate: avg()
```

but SQLite produces an error.  Instead, we must use a `HAVING` clause.  `HAVING` is essentially the same as `WHERE` but takes place after all filtering and grouping, and can thus filter using aggregate functions.  So if we write

```SQL
SELECT class, count(name) as car_count
FROM mtcars
GROUP BY class
HAVING avg(hp) >= 120;
```

```
class       car_count 
----------  ----------
low         5         
med         13        
```

SQLite returns the count of each class having an average `hp` of at least 120.  We could also first filter out the cars with an `hp` of less than 120, count them, and then remove the classes with an average `hp` of less than 120.

```SQL
SELECT class, count(name) as car_count
FROM mtcars
WHERE hp >= 120
GROUP BY class
HAVING avg(hp) >= 120;
```

```
class       car_count 
----------  ----------
low         5         
med         12
```

## Joins
Very often in practice, you'll find yourself in a situation where you need to merge together data from several different sources.  For example, you may have some ZIP Level Census data and you want to attach to some ZIP Level real estate sales, or maybe you have a list of flight arrivals at an airport, listed by tail number, that you want to match with radar data.  Join operations are how such tasks are performed using SQL.

### Data
Before we jump into the various kinds of joins, we first need some data to work with.  Here we're going to use some airline passenger and flight data which can be downloaded [here](https://vincentarelbundock.github.io/Rdatasets/csv/datasets/AirPassengers.csv) and [here](https://vincentarelbundock.github.io/Rdatasets/csv/datasets/airmiles.csv).

After downloading them, take a look and familiarize yourself with the contents.  Now let's bring them into our environment.

```
.mode csv
.import airmiles.csv airmiles
.import AirPassengers.csv AirPassengers

.mode columns
.headers on
```

```SQL
SELECT * FROM airmiles LIMIT 5;
```

```
            time        airmiles  
----------  ----------  ----------
1           1937        412       
2           1938        480       
3           1939        683       
4           1940        1052      
5           1941        1385 
```

```SQL
SELECT * FROM AirPassengers LIMIT 5;
```

```
            time        AirPassengers
----------  ----------  -------------
1           1949        112          
2           1949.08333  118          
3           1949.16666  132          
4           1949.25     129          
5           1949.33333  121          
```

It looks like that first column in both is just a row index left over when the data was dumped.  Let's remove it.

```SQL
CREATE TABLE temp AS SELECT time, airmiles FROM airmiles;
DROP TABLE airmiles;
ALTER TABLE temp RENAME TO airmiles;
```

```SQL
CREATE TABLE temp AS SELECT time, AirPassengers FROM AirPassengers;
DROP TABLE AirPassengers;
ALTER TABLE temp RENAME TO AirPassengers;
```

```SQL
SELECT * FROM airmiles LIMIT 5;
```

```
time        airmiles  
----------  ----------
1937        412       
1938        480       
1939        683       
1940        1052      
1941        1385   
```

```SQL
SELECT * FROM AirPassengers LIMIT 5;
```

```
time        AirPassengers
----------  -------------
1949        112          
1949.08333  118          
1949.16666  132          
1949.25     129          
1949.33333  121  
```

And now examine them again.

Note that most SQL systems include a `DROP COLUMN` function, but SQLite does not.  As such, we have to use the method above of creating a temporary table with the needed columns and renaming it.

### `JOIN`
`JOIN` in SQLite is also called an inner join.  If you are familiar with set theory, this operation represents the Cartesian Product of the two sets.  In other words, every element the left table is matched with every element in the right table.  Consult the [documentation](https://www.sqlite.org/lang_select.html#fromclause) for a more precise definition.  Let's do a simple but pretty useless `JOIN` to illustrate exactly what is going on.

```SQL
SELECT m.*, p.* FROM airmiles m JOIN AirPassengers p;
```

```
time        airmiles    time        AirPassengers
----------  ----------  ----------  -------------
1937        412         1949        112          
1937        412         1949.08333  118          
1937        412         1949.16666  132          
1937        412         1949.25     129          
1937        412         1949.33333  121          
1937        412         1949.41666  135          
1937        412         1949.5      148          
1937        412         1949.58333  148          
1937        412         1949.66666  136          
1937        412         1949.75     119          
1937        412         1949.83333  104          
1937        412         1949.91666  118          
...         ...         ..........  ...
```

As you can see, every row in airmiles is matched to every row in AirPassengers.  Note that in the `FROM` clause, we rename the tables within the query for brevity.

A much more useful `JOIN` for this data would be to join them on their common element, namely, the time.

```SQL
SELECT m.*, p.* FROM airmiles m JOIN AirPassengers p ON m.time = p.time;
```

```
time        airmiles    time        AirPassengers
----------  ----------  ----------  -------------
1949        6753        1949        112          
1950        8003        1950        115          
1951        10566       1951        145          
1952        12528       1952        171          
1953        14760       1953        196          
1954        16769       1954        204          
1955        19819       1955        242          
1956        22362       1956        284          
1957        25340       1957        315          
1958        25343       1958        340          
1959        29269       1959        360          
1960        30514       1960        417   
```

As you can see, every unique `time` in `airmiles` is matched with the corresponding `time` in `AirPassengers`.  This is specified in the `ON` clause which almost always will follow a `JOIN` statement.

```SQL
SELECT m.*, p.* FROM airmiles m JOIN AirPassengers p ON m.time = p.time;
```

```
time        airmiles    time        AirPassengers
----------  ----------  ----------  -------------
1949        6753        1949        112          
1950        8003        1950        115          
1951        10566       1951        145          
1952        12528       1952        171          
1953        14760       1953        196          
1954        16769       1954        204          
1955        19819       1955        242          
1956        22362       1956        284          
1957        25340       1957        315          
1958        25343       1958        340          
1959        29269       1959        360          
1960        30514       1960        417 
```

### `LEFT JOIN`
Suppose that instead of returning only rows with matching values, like in the inner join above, you instead wanted to return *all* rows for one of the tables.  This operation is called an outer join, and specifically, a `LEFT JOIN` in SQLite since the table on the left will have *all* its rows returned.  For example

```SQL
SELECT p.*, m.* FROM AirPassengers p LEFT JOIN airmiles m  ON p.time = m.time;
```

```
time        AirPassengers  time        airmiles  
----------  -------------  ----------  ----------
1949        112            1949        6753      
1949.08333  118                                  
1949.16666  132                                  
1949.25     129                                  
1949.33333  121                                  
1949.41666  135                                  
1949.5      148                                  
1949.58333  148                                  
1949.66666  136                                  
1949.75     119                                  
1949.83333  104                                  
1949.91666  118                                  
1950        115            1950        8003      
1950.08333  126                                  
1950.16666  141   
..........  ...            ....        ....
```

Returns *all* rows in the left table even if there is no matching row in the right table.  In practice, you'll find that an inner join is typically what you want, but in situations where you want to keep track of what data is actually missing, you'll end up going for the outer join.

## Exercises

Using the [Titanic](https://vincentarelbundock.github.io/Rdatasets/csv/datasets/Titanic.csv) dataset, perform the following exercises.

1. Import the data into SQLite, removing the index column.
1. Run a query that returns the number of passengers in each class under the age of 16.
1. Run a query that returns the average age of those who died (`Survived = 0`) and those who survived for male and female separately as well as the counts for each group.
1. Run a query that returns the portion of survivors in each class.
1. Run a query that returns the portion of females and males in each class who survived with an age greater than the average.
1. Download the [USArrest](https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USArrests.csv) and [road](https://vincentarelbundock.github.io/Rdatasets/csv/MASS/road.csv) accident data and perform the following.
    1. Import the data into your SQLite environment.
    1. What column should the data be joined on?  After identifying it, `JOIN` on that column.
    1. Perform a `LEFT_JOIN` on the column identified in part 2.
    1. Are the results of your joins useful?  If not, what would be required to fix them? (Don't actually do it!).