### BigQuery SQL Data Analysis 
This code performs an analysis on Google BigQuery datasets. The datasets encompass 1) school districts in Austin and 2) courses taken at The University of Texas at Austin.

#### 1. create BQ dataset:

In [None]:
%env dataset_id=XXXXX

In [None]:
!bq --location=US mk --dataset $dataset_id

#### 2. create and populate BQ tables:
##### Note: the load commands will run for a few minutes. Please be patient and wait for them to finish before moving onto step 3.

In [6]:
import os

In [None]:
gsutil_cmd = "gsutil ls gs://XXXXXXX"
file_listings = os.popen(gsutil_cmd)
dataset="school_enrollments"

for file in file_listings:

    start_index = file.rindex("_") + 1
    end_index = file.rindex(".")
    table = file[start_index:end_index]
    
    if table in ("co", "district", "fl", "ga", "id", "ma", "mo", "ne", "ny", "or", "school", "sd", "wi"):
        print("skipping " + table)
        continue

    bq_cmd = "bq --location=US load --autodetect --skip_leading_rows=1 "\
             "--source_format=CSV " + dataset + "." + table + ' ' + file
    print(bq_cmd)
    
    os.system(bq_cmd)

#### 3. Open the BQ UI and explore the schemas for the tables in the `school_enrollments` dataset. Also, preview the tables in the UI to see some sample data. 

#### 4. Find school enrollments by school name, grade level, and year for Texas:

In [None]:
%load_ext google.cloud.bigquery

In [None]:
%%bigquery
select school, grade, year, sum(cast(replace(replace(total, ",", ""), "<", "") as int64)) as total
from school_enrollments.tx
where total != "" and school != "" and district like '%AUSTIN%'
group by school, grade, year
order by school, grade, year

##### Note: Since the `total` column in the table is of type STRING, it was cast to an INTEGER type before the `sum` function was applied. Documentation on BQ's string functions, including `cast()` and `replace()`: https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions

#### 5. Find school enrollments by school name, grade level, and year for Austin:

In [None]:
%%bigquery
select school, grade, year, sum(cast(replace(replace(total, ",", ""), "<", "") as int64)) as total
from school_enrollments.tx
where total != "" and school != "" and district like '%AUSTIN%'
group by school, grade, year
order by school, grade, year

#### 1) From the Takes table look for the most popular courses taken. Returns the courses that were taken by at least 3 students.

In [None]:
%%bigquery
SELECT cno
FROM college.Takes
GROUP BY cno
HAVING COUNT(DISTINCT sid) >= 3
ORDER BY cno

#### 2) Results in the teachers that have taught at least 2 courses. Should return cannata & mitra.

In [None]:
%%bigquery
SELECT tid
FROM college.Teaches
GROUP BY tid
HAVING COUNT(tid) >= 2 
ORDER BY tid

#### 3) This query finds students who've taken courses from multiple departments.

In [None]:
%%bigquery
SELECT s.sid, s.fname, s.lname
FROM college.Student s
JOIN college.Takes t ON s.sid = t.sid
JOIN college.Teaches teach ON t.cno = teach.cno
GROUP BY s.sid, s.fname, s.lname
HAVING COUNT(DISTINCT teach.tid) > 1
ORDER BY s.sid

#### 4) Determines which departments have an average student age of over 20 y/o. Also returns the number of classes that were taken by students in each department.

In [None]:
%%bigquery
SELECT LEFT(t.cno, 2) AS department, COUNT(*) AS student_count, AVG(EXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM s.dob)) AS average_age
FROM college.Takes t
JOIN college.Student s ON t.sid = s.sid
GROUP BY department
HAVING average_age > 20
ORDER BY average_age DESC

#### 5) This query finds out how many teachers teach for the Computer Science department. We see that 6 instructors teach for the CS department.

In [None]:
%%bigquery
SELECT dept, COUNT(dept) AS instructor_count
FROM college.Instructor 
WHERE dept = 'Computer Science'
GROUP BY dept
ORDER BY dept

#### 6) Get a list of students who were born in the month of April.

In [None]:
%%bigquery
SELECT sid, COUNT (DISTINCT sid) AS student_count
FROM college.Student
WHERE EXTRACT(MONTH FROM dob) = 4
GROUP BY sid
ORDER BY sid, student_count

#### 7) Returns the number of records with students (could include duplicates) who have a DOB of 2000-08-22.

In [None]:
%%bigquery
SELECT COUNT(*) AS student_count
FROM college.Student
WHERE dob = '2000-08-22'
GROUP BY dob
ORDER BY student_count DESC

#### 8) Returns the number of records that have a grade of A- from the Takes table.

In [None]:
%%bigquery
SELECT grade, COUNT(*) AS count_of_grade
FROM college.Takes
WHERE grade = 'A-'
GROUP BY grade
ORDER BY grade