# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json
import pandas as pd
import numpy as np
from collections import defaultdict

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

### * What are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)

In [None]:
subjects = {c['subject'] for c in courses }
print(subjects)

### * How many instructors taught at Brandeis last year?

In [None]:
instructors = len({c['instructor'] for c in courses if c['subject'] == 'COSI' })
instructors

### * Which terms are represented?

In [None]:
terms = {c['term'] for c in courses }
terms

### * What is the total number of students taking COSI courses last year?

In [None]:
total = 0
for c in courses:
    if c['subject'] == 'COSI':
        total += c['enrolled']
total

### * What was the median size of a COSI course with at least 1 student?

In [None]:
median = []
for c in courses:
    if c['subject'] == 'COSI' and c['enrolled'] >= 1:
        median.append(c['enrolled'])
np.median(median)

### * Which are the five largest courses measured by number of students taught?

In [None]:
student_list = {}
for c in courses:
    if c['subject'] in student_list:
        student_list[c['subject']] += c['enrolled']
    else:
        student_list[c['subject']] = c['enrolled']
students_courses = []
for subject, num in student_list.items():
    students_courses.append((subject, num))
students_courses.sort(key=lambda x: x[1], reverse=True)
students_courses[0:5]

### * Which are the five largest subjects measured by number of courses offered?

In [None]:
subjects = defaultdict(set)
for c in courses:
    subjects[c['subject']].add(c['name'])

subjects_course_count = []
for subject, course in subjects.items():
    subjects_course_count.append((subject, len(course)))
subjects_course_count.sort(key=lambda x: x[1], reverse=True)
subjects_course_count[0:5]

### * Which are the five largest courses measured by number of faculty teaching courses in that subject?

In [None]:
faculty = defaultdict(set)
for c in courses:
    faculty[c['subject']].add(c['instructor'][2])

faculty_count = []
for subject, instructor in faculty.items():
    faculty_count.append((subject, (len(instructor))))
faculty_count.sort(key=lambda x: x[1], reverse=True)
faculty_count[0:5]
