# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1276]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

##### a. how many faculty taught COSI courses last year?

In [None]:
#author: Yiwen Li

len({c['instructor'] for c in courses if c['subject'] == 'COSI'})

print("# of faculty taught COSI courses last year:""")
print(len({c['instructor'] for c in courses if c['subject'] == 'COSI'}))

##### b. what is the total number of students taking COSI courses last year?

In [None]:
#author: Qing Liu

print("# the total number of students taking COSI courses last year:""")
print(sum([c['enrolled'] for c in [c for c in courses if c['subject'] == 'COSI']]))

##### c. what was the median size of a COSI course last year (counting only those courses with at least 10 students)


In [None]:
#author: Jiefang Li
import statistics
    
course_list = [c['enrolled'] for c in courses if c['subject']=='COSI' and c['enrolled']>=10]  
course_list.sort()

print ("median size of a COSI course last year: ")
print(statistics.median(course_list))

# size = len(course_list)

# res = course_list[size//2] if size%2==1 else (course_list[size//2]+course_list[size//2-1])/2

# print ("median size of a COSI course last year: ")
# print (res)

        

##### d. create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.

In [3]:
# author: Huijie Liu

key = 'subject'
subj_num = {}
res = []

for i in {c['subject'] for c in courses}:
    res.append(tuple([sum(map(lambda c: c['enrolled'], [c for c in courses if c['subject'] == i])), i]))
    
res.sort(reverse=True)
print("the top 10 subjects in terms of number of students taught:")
print(res[:10])

# {for course in courses}
# for course in courses:
#     subj_num[course[key]] = subj_num.get(course[key], 0) + course['enrolled']
# tuple_list = [(subj, num) for subj, num in subj_num.items()]
# tuple_list.sort(key = lambda x: x[1], reverse = True)
# print(tuple_list[:10])

the top 10 subjects in terms of number of students taught:
[(5318, 'HS'), (3085, 'BIOL'), (2766, 'BUS'), (2734, 'HWL'), (2322, 'CHEM'), (2315, 'ECON'), (2223, 'COSI'), (1785, 'MATH'), (1704, 'PSYC'), (1144, 'ANTH')]


##### e. do the same as in (d) but print the top 10 subjects in terms of number of courses offered

In [None]:
# author: Yiwen Luo
res = []
for i in {c['subject'] for c in courses}:
#     print(i)
#     print(len({c['code'] for c in courses if c['subject'] == i}))
    res.append(tuple([len({c['code'] for c in courses if c['subject'] == i}), i]))
    res.sort(reverse = True)
    

print("top 10 number of course offered")
print(res[0:10])

##### f. do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject

In [None]:
#author: Jiefang Li

from collections import Counter, defaultdict

subj_teachers = defaultdict(set)

for i in {c['subject'] for c in courses}:
    res.append(tuple([sum(map(lambda c: c['enrolled'], [c for c in courses if c['subject'] == i])), i]))

 
for c in courses:
    subj_teachers[c['subject']].add(c['instructor'])

res = [(sub, len(subj_teachers[sub])) for sub in subj_teachers]
res.sort(reverse=True, key=lambda x:x[1])
print([sub[0] for sub in res[:10]])


In [None]:
res = []
for i in {c['subject'] for c in courses}:
    res.append(tuple([len({c['instructor'] for c in courses if c['subject'] == i}), i]))
    res.sort(reverse = True)

print(res[0:10])


##### g. list the top 20 faculty in terms of number of students they taught

In [None]:
#author: Qing Liu
from collections import defaultdict
dict1 = defaultdict(int)

res=[]
for i in {c['instructor'] for c in courses}:
    res.append(tuple([sum(map(lambda c: c['enrolled'], [c for c in courses if c['instructor'] == i])), i]))

res.sort(reverse=True)
print("the top 20 faculty in terms of number of students they taught:")
print([sub[1] for sub in res[:20]])

#### h. list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)

In [4]:
# author: Huijie Liu

# key1 = 'subject'
# key2 = 'coursenum'
# course_num = {}


res = []
for i in {(c['subject'], c['coursenum']) for c in courses}:
    res.append(tuple([sum(map(lambda c: c['enrolled'], [c for c in courses if c['subject'] == i[0] and c['coursenum'] == i[1]])), i]))
    
res.sort(reverse=True)
print(res[:20])


# for course in courses:
#     course_num[course[key1]+' '+course[key2]] = course_num.get(course[key1]+' '+course[key2], 0) + course['enrolled']
# tuple_list = [(course, num) for course, num in course_num.items()]
# tuple_list.sort(key = lambda x: x[1], reverse = True)
# for tupleItem in tuple_list[:20]:
#     print(tupleItem[0])

[(940, ('HWL', '1')), (879, ('HWL', '1-PRE')), (358, ('BIOL', '14A')), (343, ('COSI', '10A')), (336, ('PSYC', '10A')), (287, ('BIOL', '15B')), (280, ('MATH', '10A')), (274, ('BIOL', '18B')), (262, ('BIOL', '18A')), (245, ('CHEM', '29A')), (239, ('CHEM', '29B')), (236, ('CHEM', '25A')), (231, ('PSYC', '51A')), (226, ('CHEM', '25B')), (225, ('COSI', '12B')), (215, ('BUS', '6A')), (208, ('CHEM', '18A')), (207, ('ECON', '10A')), (204, ('MATH', '15A')), (201, ('COSI', '21A'))]


##### i. Create your own interesting question (each team member creates their own) and use Python to answer that question.

In [None]:
#author: Jiefang Li
#return the top 10 courses which have the most people in the waiting list and the number of student in the waiting list

from collections import Counter, defaultdict

course_list = [(c['waiting'],c['name']) for c in courses]  

course_list.sort(reverse = True)

print(course_list[:10])



In [None]:
#### This is my first comment

In [None]:
# author: Yiwen Luo
# Return the number of independent study course in each subject in descending order

res = []
for i in {c['subject'] for c in courses}:
    res.append(tuple([len({c['code'] for c in courses if c['subject'] == i and c['independent_study'] == True}), i]))
    res.sort(reverse = True)
    

print("# of independent study courses in each department")
print(res)

In [None]:
# author: Qing Liu
# Return the number of courses with at least one coinstructor by subject

res = list()
for i in {c['subject'] for c in courses}:
    numCoinstructors = sum([1 for c in courses 
                        if len(c['coinstructors']) >= 1 and c['subject'] == i])
    if numCoinstructors > 0:
        res.append((i, numCoinstructors))    

    

print("# of courses with at least 1 coinstructor")
print(res)

In [5]:
# author: Huijie Liu
# list the course which has the longest course names

course_list = [c['name'] for c in courses]  
course_list.sort(key = len, reverse = True)

print(course_list[0])

From Les Confessions to Instagram: Self-Writing in Contemporary French and Francophone Literature
