# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

### (a) how many faculty taught COSI courses last year?

In [None]:
facultyCOSI = set()
for course in courses:
    if course["subject"] == "COSI":
        facultyCOSI.add(course["instructor"][0] + course["instructor"][1])
len(facultyCOSI)

    

### (b) what is the total number of students taking COSI courses last year?

In [None]:
count = 0
for course in courses:
    if course["subject"] == "COSI":
        count += course["enrolled"]
count

### (c) what was the median size of a COSI course last year (counting only those courses with at least 10 students)


In [None]:
COSI_course = []
res = 0
for course in courses:
    if course["subject"] == "COSI" and course["enrolled"] >= 10:
        COSI_course.append(course["enrolled"])
COSI_course.sort()
middle = len(COSI_course) // 2
if len(COSI_course) % 2 == 0:
    res = (COSI_course[middle] + COSI_course[middle - 1]) / 2
else:
    res = COSI_course[middle]
res

### (d) List top 10 subjects in terms of number of students enrolled

In [None]:
subjects = {}
for course in courses:
    if course['subject'] in subjects:
        subjects[course['subject']] += course['enrolled']
    else:
        subjects[course['subject']] = course['enrolled']
subjects = sorted(subjects.items(), key=lambda x: x[1], reverse=True)
for i in range(10):
    print(subjects[i])

### (g) List the top 20 faculty in terms of number of students they taught.

In [None]:
#Confirmed that code works with: sum([course['enrolled'] for course in courses if course['instructor'] == ('Leah', 'Berkenwald', 'leahb@brandeis.edu')])#

courses_by_instructor = sorted(courses, key = lambda course: course['instructor'])
students_per_instructor = []
curInstructor = courses_by_instructor[0]['instructor']
students = courses_by_instructor[0]['enrolled']

for course in courses_by_instructor[1:]:
   if curInstructor == course['instructor']:
      students += course['enrolled']
   else:
      students_per_instructor.append((students, curInstructor))
      curInstructor = course['instructor']
      students = course['enrolled']
   if course == courses_by_instructor[len(courses_by_instructor)-1]:
      students_per_instructor.append((students, curInstructor))
      
instructors_by_num_of_students = [instructor for (students, instructor) in sorted(students_per_instructor, key = lambda element: -element[0])]
instructors_by_num_of_students[:20]


### (h) List the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)

In [None]:
#Confirmed that code works with: sum([course['enrolled'] for course in courses if course['code'] == ('HWL', '1')])

sum([course['enrolled'] for course in courses if course['instructor'] == ('Adam B.', 'Jaffe', 'ajaffe@brandeis.edu')])
courses_by_code = sorted(courses, key = lambda course: course['code'])
students_per_course = []
curCourse = courses_by_code[0]['code']
students = courses_by_code[0]['enrolled']

for course in courses_by_code[1:]:
    if curCourse == course['code']:
        students += course['enrolled']
    else:
        students_per_course.append((students, curCourse))
        curCourse = course['code']
        students = course['enrolled']
    if course  == courses_by_code[len(courses_by_code)-1]:
        students_per_course.append((students, curCourse))
        
courses_by_num_of_students = [course for (students, course) in sorted(students_per_course, key = lambda element: -element[0])]
courses_by_num_of_students[:20]

### Top ten professors in terms of offering the most number of independent studies with at least one student.

In [None]:
courses_by_instructor = sorted(courses, key = lambda course: course['instructor'])
indies_by_instructor = []

curInstructor = courses_by_instructor[0]['instructor']
indiesWithStudentsCounter = 0

for course in courses_by_instructor:
    if curInstructor == course['instructor'] and course['independent_study'] and course['enrolled'] > 0:
        indiesWithStudentsCounter += 1
    elif curInstructor != course['instructor']:
        indies_by_instructor.append((indiesWithStudentsCounter, curInstructor))
        curInstructor = course['instructor']
        indiesWithStudentsCounter = 1 if course['independent_study'] and course['enrolled'] > 0 else 0
    if course == courses_by_instructor[len(courses_by_instructor)-1]:
        indies_by_instructor.append((indiesWithStudentsCounter, curInstructor))

instructors_by_indiesWithStudents = [instructor for (counter, instructor) in sorted(indies_by_instructor, key = lambda element: -element[0])]
instructors_by_indiesWithStudents[:10]

### The top ten courses in terms of length of description.

In [None]:
sum([course['enrolled'] for course in courses if course['instructor'] == ('Adam B.', 'Jaffe', 'ajaffe@brandeis.edu')])
courses_by_code = sorted(courses, key = lambda course: course['code'])
courses_by_lenDescr = sorted(courses, key = lambda course: -len(course['description']))
top_ten = set()

i = 0

while len(top_ten) < 10:
    top_ten.add(courses_by_lenDescr[i]['code'][0] + courses_by_lenDescr[i]['code'][1])
    i += 1
    
top_ten

### (i3: James) Number of courses with coinstructors

In [None]:
len([course for course in courses if len(course['coinstructors']) > 0])