# Belongs To High School Data

### Introduction
The database we are looking at in this lab has two tables, `sat_records` and `high_schools`. Each row of the `sat_records` table represents a school also found in the `high_schools` table. The records in the `sat_records` table "belong to" the schools in the `high_schools` table. Both tables have a key in the column "dbn" that is a unique identifier for each school, and will be our way of connecting the data between the two tables. We do this by using a JOIN clause.

Let's begin by connecting to the database and reviewing the structure of the tables:

In [1]:
import sqlite3
conn = sqlite3.connect('schools.db')
cursor = conn.cursor()

In [2]:
import pandas as pd
highschools_url = "https://raw.githubusercontent.com/sql-fundamentals-jigsaw/mod-1-sql-curriculum/master/2-sql-relations/2-belongs-to-hs/data/highschools.csv"
sat_records_url = "https://raw.githubusercontent.com/sql-fundamentals-jigsaw/mod-1-sql-curriculum/master/2-sql-relations/2-belongs-to-hs/data/sat_records.csv"
df_hs = pd.read_csv(highschools_url)
df_sat = pd.read_csv(sat_records_url)
df_hs.to_sql('high_schools' ,conn, index = False, if_exists = 'replace')
df_sat.to_sql('sat_records' ,conn, index = False, if_exists = 'replace')

478

In [3]:
pd.read_sql('select * from high_schools limit 1;',conn)

Unnamed: 0,id,dbn,school_name,boro,total_students,graduation_rate,attendance_rate,college_career_rate
0,0,16K498,Brooklyn High School for Law and Technology,K,594,0.74,0.85,0.49


In [4]:
pd.read_sql('select * from sat_records limit 1;',conn)

Unnamed: 0,id,dbn,name,num_test_takers,reading_avg,math_avg,writing_score
0,0,01M292,HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES,29.0,355.0,404.0,363.0


### Exploring the Data

In [5]:
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('high_schools',), ('sat_records',)]

In [6]:
cursor.execute('PRAGMA table_info(high_schools)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'dbn', 'TEXT', 0, None, 0),
 (2, 'school_name', 'TEXT', 0, None, 0),
 (3, 'boro', 'TEXT', 0, None, 0),
 (4, 'total_students', 'INTEGER', 0, None, 0),
 (5, 'graduation_rate', 'REAL', 0, None, 0),
 (6, 'attendance_rate', 'REAL', 0, None, 0),
 (7, 'college_career_rate', 'REAL', 0, None, 0)]

In [7]:
cursor.execute('PRAGMA table_info(sat_records)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'dbn', 'TEXT', 0, None, 0),
 (2, 'name', 'TEXT', 0, None, 0),
 (3, 'num_test_takers', 'REAL', 0, None, 0),
 (4, 'reading_avg', 'REAL', 0, None, 0),
 (5, 'math_avg', 'REAL', 0, None, 0),
 (6, 'writing_score', 'REAL', 0, None, 0)]

Each of the following questions require information in both of the tables in the database. Use the JOIN clause to find the solution.

* In which boro is the school that has the highest writing score?

In [21]:
cursor.execute('''SELECT h.boro, MAX(s.writing_score)
FROM high_schools h
JOIN sat_records s ON h.dbn = s.dbn
''')
cursor.fetchall()


# [('M', 682.0)]

[('M', 682.0)]

* In which boro is the school with the lowest math average?

In [22]:
cursor.execute('''SELECT h.boro, MIN(s.math_avg)
FROM high_schools h
JOIN sat_records s ON h.dbn = s.dbn
''')
cursor.fetchall()

# [('X', 312.0)]

[('X', 312.0)]

* What is the highest math_avg for schools with more than 1000 students?

In [23]:
cursor.execute('''SELECT MAX(s.math_avg)
FROM high_schools h
JOIN sat_records s ON h.dbn = s.dbn
WHERE h.total_students > 1000
''')
cursor.fetchall()

# [(735.0,)]

[(735.0,)]

* What is the average number of test takers in each boro?

In [31]:
cursor.execute('''SELECT h.boro, AVG(s.num_test_takers)
FROM high_schools h
JOIN sat_records s ON h.dbn = s.dbn
GROUP BY 1
''')
cursor.fetchall()

# [('K', 126.33673469387755),
#  ('M', 110.34177215189874),
#  ('Q', 199.51666666666668),
#  ('R', 300.5),
#  ('X', 80.3875)]

[('K', 126.33673469387755),
 ('M', 110.34177215189874),
 ('Q', 199.51666666666668),
 ('R', 300.5),
 ('X', 80.3875)]

* What is the attendance rate of schools with math_avg greater than 500? Order your results by the attendance rate (descending) limit to the first five results

In [36]:
cursor.execute('''SELECT h.attendance_rate
FROM high_schools h
JOIN sat_records s ON h.dbn = s.dbn
WHERE s.math_avg > 500
ORDER BY 1 DESC
LIMIT 5
''')
cursor.fetchall()



# [(0.98,), (0.97,), (0.97,), (0.97,), (0.97,)]


[(0.98,), (0.97,), (0.97,), (0.97,), (0.97,)]

* What is the graduation rate of schools with a math_avg less than 500? Order your results by the graduation rate (ascending) limit to ten results

In [40]:
cursor.execute('''SELECT hs.graduation_rate
FROM high_schools hs
JOIN sat_records sr ON hs.dbn = sr.dbn
WHERE sr.math_avg < 500
ORDER BY 1 ASC
LIMIT 10
''')
cursor.fetchall()



# [(None,),
#  (None,),
#  (None,),
#  (None,),
#  (0.39,),
#  (0.46,),
#  (0.47,),
#  (0.49,),
#  (0.5,),
#  (0.5,)]

[(None,),
 (None,),
 (None,),
 (None,),
 (0.39,),
 (0.46,),
 (0.47,),
 (0.49,),
 (0.5,),
 (0.5,)]

* For schools with a math avg greater than 500, what is the average graduation rate?

In [43]:
cursor.execute('''SELECT AVG(hs.graduation_rate)
FROM high_schools hs
JOIN sat_records sr ON hs.dbn = sr.dbn
WHERE sr.math_avg > 500
''')
cursor.fetchall()


# [(0.9769999999999999,)]

[(0.9769999999999999,)]

* What is the total number of test takers in each boro?

In [45]:
cursor.execute('''SELECT hs.boro, sum(sr.num_test_takers)
FROM high_schools hs
JOIN sat_records sr ON hs.dbn = sr.dbn
GROUP BY 1
''')
cursor.fetchall()

# [('K', 12381.0), ('M', 8717.0), ('Q', 11971.0), ('R', 3005.0), ('X', 6431.0)]

[('K', 12381.0), ('M', 8717.0), ('Q', 11971.0), ('R', 3005.0), ('X', 6431.0)]

* What is the average combined reading and math scores for each boro?

In [46]:
cursor.execute('''SELECT hs.boro, AVG(sr.math_avg + sr.reading_avg)
FROM high_schools hs
JOIN sat_records sr ON hs.dbn = sr.dbn
GROUP BY 1
''')
cursor.fetchall()

# [('K', 795.2857142857143),
#  ('M', 869.5822784810126),
#  ('Q', 874.5666666666667),
#  ('R', 930.0),
#  ('X', 778.2375)]

[('K', 795.2857142857143),
 ('M', 869.5822784810126),
 ('Q', 874.5666666666667),
 ('R', 930.0),
 ('X', 778.2375)]

* Find the top five schools that have the largest differences between num_test_takers and total_students

In [51]:
cursor.execute('''SELECT hs.school_name, (hs.total_students - sr.num_test_takers) as difference
FROM high_schools hs
JOIN sat_records sr ON hs.dbn = sr.dbn
ORDER BY 2 DESC
LIMIT 5
''')
cursor.fetchall()

# [('Brooklyn Technical High School', 4561.0),
#  ('Fort Hamilton High School', 3888.0),
#  ('Francis Lewis High School', 3623.0),
#  ('Midwood High School', 3234.0),
#  ('James Madison High School', 3139.0)]

[('Brooklyn Technical High School', 4561.0),
 ('Fort Hamilton High School', 3888.0),
 ('Francis Lewis High School', 3623.0),
 ('Midwood High School', 3234.0),
 ('James Madison High School', 3139.0)]

* What is the difference between total students and number of test takers for each boro? Order your answer in ascending order

In [53]:
cursor.execute('''SELECT hs.boro, SUM(hs.total_students - sr.num_test_takers) as difference
FROM high_schools hs
JOIN sat_records sr ON hs.dbn = sr.dbn
GROUP BY 1
ORDER BY 2 ASC
LIMIT 5
''')
cursor.fetchall()

# [('R', 15627.0),
#  ('X', 35972.0),
#  ('M', 41118.0),
#  ('Q', 58712.0),
#  ('K', 61192.0)]

[('R', 15627.0),
 ('X', 35972.0),
 ('M', 41118.0),
 ('Q', 58712.0),
 ('K', 61192.0)]

### Conclusion
This lab presented us with a database in which records from one table "belong to" records from another table. Specifically, we saw that the data in the `sat_records` table belonged to the schools in the `high_schools` table. We used JOIN clauses to combine the data in both tables to find solutions to questions that required information in both tables.

### Resources

[School District Breakdown](https://data.cityofnewyork.us/Education/School-District-Breakdowns/g3vh-kbnw)

[SAT Results](https://data.cityofnewyork.us/Education/2012-SAT-Results/f9bf-2cp4)