# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import statistics

In [None]:
import json

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

#### a. How many faculty taught COSI courses last year?

In [None]:
len({course['instructor'] for course in courses if course['subject']=='COSI' and (course['term']=='1211' or course['term'] == '1212')})

#### b. what is the total number of students taking COSI courses last year?

In [None]:
sum({course['enrolled'] for course in courses if course['subject'] == 'COSI' and (course['term']=='1211' or course['term'] == '1212')})

#### c. what was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [None]:
statistics.median({course['enrolled'] for course in courses if course['subject'] == 'COSI' and 
                   (course['term']=='1211' or course['term'] == '1212') and (course['enrolled']>=10)})

#### g. list the top 20 faculty in terms of number of students they taught

In [None]:
instructor = dict()
for c in courses:
    if c['instructor'] in instructor:
        instructor[c['instructor']] += c['enrolled']
    else:
        instructor[c['instructor']] = c['enrolled']
[ins for ins in sorted(instructor.items(), key= lambda instructor: instructor[1], reverse = True)][:10]

#### h. list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)

In [None]:
courses_students = dict()
for c in courses:
    if c['code'] in courses_students.keys():
        courses_students[c['code']] += c['enrolled']
    else:
        courses_students[c['code']] = 0
courses_students = sorted(courses_students.items(), key = lambda item: item[1], reverse = True)
print(courses_students[:20])

#### i. Create my own interesting question(Lu):
    What are the courses that mentioned code in its description?

In [None]:
{c['name'] for c in courses if 'code' in c['description']}

#### i. Create my own interesting question(Jing):
    The top 10 instructors in terms of the number of independent study they have taught

In [None]:
instructor_with_inde = dict()
for c in courses:
    if course['independent_study'] == True:
        if c['instructor'] in instructor_with_inde:
            instructor_with_inde[c['instructor']] += 1
        else:
            instructor_with_inde[c['instructor']] = 1
[ins for ins in sorted(instructor_with_inde.items(), key= lambda instructor: instructor[1], reverse = True)][:10]