<h1 style='text-align: center'>SQL Queries</h1>

## Getting Data From A SQL Database

### The Structure of a SQL Query

<img src='images/sql_statement.jpg'/>

#### GROUP BY

- Group columns by similar values
- SELECT COUNT(id), city from students GROUP BY city

#### HAVING

- Use to apply filter AFTER a `GROUP BY` based on aggregate criteria 
- `WHERE` is applied for conditions prior to the `GROUP BY`, `HAVING` is applied afterwards

For example, if we had a table of student names and the courses they were taking, we could ask a question such as which classes have 3 or more students with the name Matt?

Such a query would look something like this:

```SQL
SELECT
  class,
  COUNT(student_name) AS number_of_alexes
FROM student_courses
WHERE student_name = "Alex"
GROUP BY 1
HAVING COUNT(student_name) >= 2;
```

In [5]:
!pip3 install PyMySQL




In [37]:
import pymysql

conn = pymysql.connect(host='database-1118.c1doesqrid0e.us-east-1.rds.amazonaws.com',
                       user='fis_student',
                       password='SuperSecurePassword',
                       db='survey')
c = conn.cursor()

In [8]:
c.execute('select * from responses')
columns = [x[0] for x in c.description]

In [9]:
columns

['submitted_ts',
 'name',
 'birthday',
 'hometown',
 'fav_food_1',
 'fav_food_2',
 'time_in_dc',
 'siblings_count']

In [10]:
print(*c.fetchall(), sep='\n')

('12/11/2019 12:57:49', 'Stephen', '12/8', '1. born: Seoul, Republic of Korea 2. raised: Arlington, Texas', 'pizza', 'beef stroganoff', '0', '2\r')
('12/11/2019 13:04:35', 'Donna C', '9/29', 'Phoenix AZ USA', 'shrimp', 'sushi', '0.2', '1\r')
('12/11/2019 13:05:08', 'Muoyo', '6/22', 'Brooklyn, NY/United States', 'Nigerian', 'Thai', '0', '1\r')
('12/11/2019 13:06:13', 'Vyjayanthi', '7/24', 'Fairfax, Virginia', 'Pasta', 'Noodles', '11', '3\r')
('12/11/2019 13:21:20', 'Anesu Masube', '3/28', 'Kwekwe, Zimbabwe', 'Pork Ribs', 'Sadza', '3', '5\r')
('12/11/2019 13:38:17', 'Michael Pallante', '8/5', 'Voorhees, NJ, United States', 'Pizza', 'Pasta', '1', '2\r')
('12/11/2019 13:43:36', 'Stuart Murphy', '1/5', 'London, Greater London, England', 'Bread', 'Indian', '10', '1\r')
('12/11/2019 13:48:58', 'Darian Madere', '8/7', 'Baton Ruuge, LA', 'tater tots', 'sushi', '1', '3\r')
('12/11/2019 13:49:07', 'Alex', '2/21', 'Alexandria, VA', 'Ice cream', 'Carrot cake', '3', '1\r')
('12/11/2019 13:52:47', 'J

### Questions
1. What are the names of all of the students?
2. Which student has the most siblings?
3. How many students are only children?
4. Which 3 students have lived in DC the shortest amount of time?
5. How many students are native DC-ers?
6. Do any two students have the same favorite food?

1. What are the names of all of the students.

In [18]:
import pandas as pd
pd.read_sql('''
SELECT name
FROM responses''', conn)

Unnamed: 0,name
0,Stephen
1,Donna C
2,Muoyo
3,Vyjayanthi
4,Anesu Masube
5,Michael Pallante
6,Stuart Murphy
7,Darian Madere
8,Alex
9,Justin Fleury


2. Which student has the most siblings?

In [25]:
pd.read_sql('''
SELECT name, siblings_count

FROM responses
WHERE cast(siblings_count as UNSIGNED) = 
    (SELECT max(cast(siblings_count as UNSIGNED))
FROM responses)

''', conn)

Unnamed: 0,name,siblings_count
0,Nick,12


3. How many students are only children?

In [27]:
pd.read_sql('''
SELECT name
FROM responses
WHERE siblings_count IN (0, '0', '0\r')
''', conn)

Unnamed: 0,name


4. Which 3 students have lived in DC the shortest amount of time? (How long has each lived in DC?)
    

In [31]:
pd.read_sql('''
SELECT name
FROM responses
ORDER BY time_in_dc DESC
LIMIT 3
''', conn)

Unnamed: 0,name
0,Vyjayanthi
1,Jill Carrie
2,Stuart Murphy


5. How many students are native New Yorkers?

In [32]:
pd.read_sql("""SELECT count(*) AS num_students
FROM responses
WHERE hometown LIKE '%NY%'""", conn)

Unnamed: 0,num_students
0,2


6. Do any two students have the same favorite food?


In [28]:
pd.read_sql("""
select fav_food_1, count(*) as student_count
FROM responses
GROUP BY lower(fav_food_1)
HAVING count(*) > 1""", conn)

Unnamed: 0,fav_food_1,student_count
0,pizza,2


## More Questions

What are the favorite foods of this classroom?

In [34]:
pd.read_sql('''
SELECT DISTINCT fav_food_1
FROM responses
UNION
SELECT fav_food_2
FROM responses
''', conn)

Unnamed: 0,fav_food_1
0,pizza
1,shrimp
2,Nigerian
3,Pasta
4,Pork Ribs
5,Bread
6,tater tots
7,Ice cream
8,lasagna
9,Sushi


Which student was born closest to the cohort's graduation date (3/6/20)?

In [38]:
pd.read_sql('''
SELECT name, 
least(abs(datediff(date(concat('2020-',
                               substr(birthday, 1, locate('/', birthday) - 1),
                               '-',
                               substr(birthday, locate('/', birthday) + 1, 2))
                              ), date('2020-03-20'))),
      abs(datediff(date(concat('2019-',
                               substr(birthday, 1, locate('/', birthday) - 1),
                               '-',
                               substr(birthday, locate('/', birthday) + 1, 2))
                              ), date('2020-03-20')))
) AS days
FROM responses
ORDER BY 2
LIMIT 1;
''', conn)

Unnamed: 0,name,days
0,Anesu Masube,8
