<h1 style='text-align: center'>SQL Queries</h1>

## Getting Data From A SQL Database

### The Structure of a SQL Query

<img src='images/sql_statement.jpg'/>

#### GROUP BY

- Group columns by similar values
- SELECT COUNT(id), city from students GROUP BY city

#### HAVING

- Use to apply filter AFTER a `GROUP BY` based on aggregate criteria 
- `WHERE` is applied for conditions prior to the `GROUP BY`, `HAVING` is applied afterwards

For example, if we had a table of student names and the courses they were taking, we could ask a question such as which classes have 3 or more students with the name Matt?

Such a query would look something like this:

```SQL
SELECT
  class,
  COUNT(student_name) AS number_of_matts
FROM student_courses
WHERE student_name = "Matt"
GROUP BY 1
HAVING COUNT(student_name) >= 3;
```

In [1]:
import sqlite3

conn = sqlite3.connect('tutorial.db')
c = conn.cursor()

In [2]:
columns = [x[0] for x in c.execute('select * from students').description]

In [3]:
columns

['name',
 'birthdate',
 'siblings',
 'birth_place',
 'years_in_nyc',
 'favorite_food']

### Questions
1. What are the names of all of the students?
2. Which student has the most siblings?
3. How many students are only children?
4. Which 3 students have lived in NYC the shortest amount of time?
5. How many students are native New Yorkers?
6. Do any two students have the same favorite food?


1. What are the names of all of the students.

In [6]:
import pandas as pd

In [8]:
pd.read_sql_query("""SELECT name FROM students""", conn)

Unnamed: 0,name
0,Sean Abu Wilson
1,David Miller
2,Abhijeet Kamble
3,Samantha Jackson
4,Anmol Srivats
5,Ran Tokman
6,Amy Li
7,Florencia Leoni
8,Austin Krause
9,Natalie Overchuk


2. Which student has the most siblings?

In [22]:
pd.read_sql_query("""
select name, siblings
from students
order by siblings desc
limit 4
""", conn)

Unnamed: 0,name,siblings
0,Florencia Leoni,4
1,Mohamad Eldebek,4
2,Menachi Korn,4
3,Miguel Peña,4


In [24]:
pd.read_sql_query("""
SELECT
  name,
  siblings
FROM students
WHERE siblings = (SELECT MAX(siblings) FROM students);
""", conn)

Unnamed: 0,name,siblings
0,Florencia Leoni,4
1,Mohamad Eldebek,4
2,Menachi Korn,4
3,Miguel Peña,4


3. How many students are only children?

In [25]:
pd.read_sql_query("""
SELECT
  COUNT(name) AS count,
  siblings
FROM students
WHERE siblings = 0;
""", conn)

Unnamed: 0,count,siblings
0,3,0


4. Which 3 students have lived in NYC the shortest amount of time? (How long has each lived in NYC?)
    

In [26]:
pd.read_sql_query("""
SELECT
  name,
  years_in_nyc
FROM students
ORDER BY 2
LIMIT 3;
""", conn)

Unnamed: 0,name,years_in_nyc
0,Anmol Srivats,0.05
1,Natalie Overchuk,0.1
2,Austin Krause,0.17


5. How many students are native New Yorkers?

In [31]:
pd.read_sql_query("""
SELECT
  name,
  birth_place
FROM Students
WHERE LOWER(TRIM(birth_place)) LIKE '%ny';
""", conn)

Unnamed: 0,name,birth_place
0,David Miller,"New York, NY"
1,Amy Li,"New York, NY"
2,Akshay Sharma,"New York, NY"
3,Adam Dick,"New York, NY"
4,Alex Mitrani,"New York, NY"
5,Nicole Roach,"Brooklyn, NY"


6. Do any two students have the same favorite food?

In [32]:
pd.read_sql_query("""
SELECT
  favorite_food,
  count(favorite_food) AS count
FROM students
GROUP BY favorite_food
HAVING count(favorite_food) > 1;""", conn)

Unnamed: 0,favorite_food,count
0,pizza,2
1,steak,2


In [37]:
# which students have the same favorite foods?
pd.read_sql_query("""
SELECT DISTINCT
  s1.name,
  favorite_food

FROM students s1
JOIN students s2 USING (favorite_food)
WHERE s1.name != s2.name
ORDER BY favorite_food
""", conn)

Unnamed: 0,name,favorite_food
0,David Miller,pizza
1,Akshay Sharma,pizza
2,Florencia Leoni,steak
3,Nicole Roach,steak


In [38]:
# which students have the same favorite foods?
pd.read_sql_query("""
SELECT DISTINCT
  name,
  favorite_food

FROM students
WHERE favorite_food IN (SELECT
  favorite_food
FROM students
GROUP BY favorite_food
HAVING count(favorite_food) > 1)
ORDER BY favorite_food
""", conn)

Unnamed: 0,name,favorite_food
0,David Miller,pizza
1,Akshay Sharma,pizza
2,Florencia Leoni,steak
3,Nicole Roach,steak


## More Questions

What are the favorite foods of this classroom?

In [18]:
pd.read_sql_query("""SELECT
  favorite_food,
  COUNT(*) AS count
FROM students 
GROUP BY 1
ORDER BY 2 DESC""", conn)

Unnamed: 0,favorite_food,count
0,pizza,2
1,steak,2
2,Avocado,1
3,Biriyani,1
4,Everything Bagels,1
5,Fusilli Sorrentina,1
6,Kare Kare,1
7,Reeses Puffs,1
8,Sushi,1
9,Tabouleh,1


7. Which student was born closest to the cohort's graduation date?

In [41]:
pd.read_sql_query("""
SELECT name,
abs(julianday('2019-' || substr(birthdate,1,2) || '-' || substr(birthdate,4,2)) - julianday('2019-08-26')) AS days
FROM students
ORDER BY 2
LIMIT 1
""", conn)

# MySQL version

# SELECT name,
# abs(date(concat('2019-',substr(birthdate,1,2), '-', substr(birthdate,4,2))) - date('2019-08-26')) AS days
# FROM students
# ORDER BY 2
# LIMIT 1

Unnamed: 0,name,days
0,Fhel Dimaano,6.0
