<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Objectives</a></span></li><li><span><a href="#Aggregating-Functions" data-toc-modified-id="Aggregating-Functions-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Aggregating Functions</a></span><ul class="toc-item"><li><span><a href="#Example-Simple-Aggregations" data-toc-modified-id="Example-Simple-Aggregations-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Example Simple Aggregations</a></span></li></ul></li><li><span><a href="#Grouping-in-SQL" data-toc-modified-id="Grouping-in-SQL-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Grouping in SQL</a></span><ul class="toc-item"><li><span><a href="#Example-GROUP-BY--Statements" data-toc-modified-id="Example-GROUP-BY--Statements-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Example <code>GROUP BY</code>  Statements</a></span><ul class="toc-item"><li><span><a href="#Without-GROUP-BY" data-toc-modified-id="Without-GROUP-BY-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Without <code>GROUP BY</code></a></span></li><li><span><a href="#With-GROUP-BY" data-toc-modified-id="With-GROUP-BY-3.1.2"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>With <code>GROUP BY</code></a></span></li></ul></li><li><span><a href="#Group-Task" data-toc-modified-id="Group-Task-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Group Task</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Possible-Solution" data-toc-modified-id="Possible-Solution-3.2.0.1"><span class="toc-item-num">3.2.0.1&nbsp;&nbsp;</span>Possible Solution</a></span></li></ul></li></ul></li><li><span><a href="#Exercises" data-toc-modified-id="Exercises-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Exercises</a></span><ul class="toc-item"><li><span><a href="#Exercise-1" data-toc-modified-id="Exercise-1-3.3.1"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span>Exercise 1</a></span></li><li><span><a href="#Exercise-2" data-toc-modified-id="Exercise-2-3.3.2"><span class="toc-item-num">3.3.2&nbsp;&nbsp;</span>Exercise 2</a></span></li></ul></li></ul></li><li><span><a href="#Joins" data-toc-modified-id="Joins-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Joins</a></span></li><li><span><a href="#Level-Up:-Execution-Order" data-toc-modified-id="Level-Up:-Execution-Order-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Level Up: Execution Order</a></span></li></ul></div>

![sql](sql-logo.jpg)

In [None]:
import pandas as pd
import sqlite3
import pandasql

conn = sqlite3.connect("flights.db")
cur = conn.cursor()

# Objectives

- Use SQL aggregation functions with GROUP BY
- Use HAVING for group filtering
- Use SQL JOIN to combine tables using keys

# Aggregating Functions

>  A SQL **aggregating function** takes in many values and returns one value.

We might've already seen some SQL aggregating functions like `COUNT()`. There's also others like SUM(), AVG(), MIN(), and MAX().

## Example Simple Aggregations

In [None]:
# Max value for longitude
pd.read_sql('''
    SELECT 
        MAX(airports.longitude)
    FROM 
        airports
''', conn)

In [None]:
# Max value for id in table
pd.read_sql('''
SELECT 
    *
FROM 
    airports
''', conn)

In [None]:
# Effectively counts all the not active airlines 
pd.read_sql('''
    SELECT 
        COUNT()
    FROM 
        airlines
    WHERE 
        active='N'
''', conn)

We can also give aliases to our aggregations:

In [None]:
# Effectively counts all the active airlines 
pd.read_sql('''
    SELECT 
        COUNT() as number_of_active_airlines
    FROM 
        airlines
    WHERE 
        active='Y'
''', conn)

# Grouping in SQL

We can go deeper and use aggregation functions on _groups_ using the `GROUP BY` clause.

The `GROUP BY` clause will group one or more columns together with the same values as one group to perform aggregation functions on.

## Example `GROUP BY`  Statements

Let's say we want to know how many active and non-active airlines there are.

### Without `GROUP BY`

Let's first start with just seeing how many airlines there are:

In [None]:
df_results = pd.read_sql('''
    SELECT 
        -- Reminde that this counts the number of rows before the SELECT
        COUNT() AS number_of_airlines
    FROM 
        airlines
''', conn)

df_results

One way for us to get the counts for each is to create two queries that will filter each kind of airline (active vs non-active) count these values:

In [None]:
df_active = pd.read_sql('''
    SELECT 
        COUNT() AS number_of_active_airlines
    FROM 
        airlines
    WHERE 
        active='Y'
''', conn)

df_not_active = pd.read_sql('''
    SELECT 
        COUNT() AS number_of_not_active_airlines
    FROM 
        airlines
    WHERE 
        active='N'
''', conn)

display(df_active)
display(df_not_active)

This technically works but you can see it's probably a bit inefficient and not as clean.

### With `GROUP BY`

Instead, we can tell the SQL server to do the work for us by grouping values we care about for us!

In [None]:
df_results = pd.read_sql('''
    SELECT 
        COUNT() AS number_of_airlines
    FROM 
        airlines
    GROUP BY
        airlines.active
''', conn)

df_results

This is great! And if you look closely, you can observe we have _three_ different groups instead of our expected two!

Let's also print out the `airlines.active` value for each group/aggregation so we know what we're looking at:

In [None]:
df_results = pd.read_sql('''
    SELECT 
        airlines.active,
        COUNT() AS number_of_airlines
    FROM 
        airlines
    GROUP BY
        airlines.active
''', conn)

df_results

## Group Task

- Which countries have the highest numbers of active airlines? Return the top 10.

In [None]:
pd.read_sql('''
''')

#### Possible Solution

In [None]:
pd.read_sql('''
    SELECT 
        COUNT() AS num,
        country
    FROM 
        airlines
    WHERE 
        active='Y'
    GROUP BY 
        country
    ORDER BY 
        num DESC
    LIMIT 10
''', conn)

> Note that the `GROUP BY` clause is considered _before_ the `ORDER BY` and `LIMIT` clauses

## Exercises

### Exercise 1

- Which countries have the highest numbers of inactive airlines? Return all the countries that have more than 10.

In [None]:
# Your code here


### Exercise 2

- Run a query that will return the number of airports by time zone. Each row should have a number of airports and a time zone.

In [None]:
# Your code here


# Joins

# Level Up: Execution Order

```SQL
SELECT 
    COUNT(table2.col2) AS my_new_count
    ,table1.col2
FROM
    table1
    JOIN table2
        ON table1.col1 = table2.col2
WHERE
    table1.col1 > 0
GROUP BY
    table2.col1
```

1. `From`
2. `Where`
3. `Group By`
4. `Having`
5. `Select`
6. `Order By`
7. `Limit`