# NYC High Schools Aggregates

### Introduction
In this lab we will practice using aggregate SQL functions. These functions, such as AVG, MIN, and MAX, allow us to perform mathematical operations on a set of numbers, and return one value. We will also use the GROUP BY function. GROUP BY allows us to group rows that have identical values in a column (or columns), often with the intention of performing an aggregate function on these groups. In the database we are using in this lab, each row represents a school, with each column representing some metric or information about that school. We could use an aggregate function to find the MAX total students of all the schools listed. But what if we wanted to know the MAX number of students by Boro? Previously we might have used a WHERE clause, but that would require a separate statement for each boro. Thats where GROUP BY clauses come in. In this example we could use GROUP BY boro, and the query would return the results of our aggregate function for each boro.

Lets begin by using the `sqlite3` library to connect to the database

In [3]:
import sqlite3
conn = sqlite3.connect('nyc_schools_single_tb.db')
cursor = conn.cursor()

In [4]:
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('high_schools',)]

In [5]:
cursor.execute('PRAGMA table_info(high_schools)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'dbn', 'TEXT', 0, None, 0),
 (2, 'name', 'TEXT', 0, None, 0),
 (3, 'num_test_takers', 'REAL', 0, None, 0),
 (4, 'reading_avg', 'REAL', 0, None, 0),
 (5, 'math_avg', 'REAL', 0, None, 0),
 (6, 'writing_score', 'REAL', 0, None, 0),
 (7, 'boro', 'TEXT', 0, None, 0),
 (8, 'total_students', 'INTEGER', 0, None, 0),
 (9, 'graduation_rate', 'REAL', 0, None, 0),
 (10, 'attendance_rate', 'REAL', 0, None, 0),
 (11, 'college_career_rate', 'REAL', 0, None, 0)]

### Aggregates

For each of the questions below, use a SQL aggregate function to find the solution. (Note that in the database, the boro column consists of the values "M" for Manhattan, "X" for the Bronx, "K" for Brooklyn, and "Q" for Queens)

* What's the average number of students in Manhattan?

* What's the average attendance in Manhattan?

* What's the largest difference between graduation_rate and college_career_rate?

* What is the highest math_avg in queens

* What is the highest math_avg in manhattan?

* What is the highest combined score in manhattan?

### Group By

* What's the average number of students in each borough

* What's the average difference between graduation_rate and college_career_rate by borough

* What's the avg college career rate grouped by math_avg scores (Hint: https://stackoverflow.com/questions/30929526/sqlite-group-by-range-of-1000s)

### Conclusion
In this lab, we performed aggregate functions on our data. This allows us to perform mathematical operations on a set of values in our database. We also used the GROUP BY clause, which gave us the ability to perform the aggregate functions on different subsets of the data at once.