# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1247]

notice that the instructor and code are tuples now


{'limit': None,
 'times': [{'start': 960, 'days': ['w', 'm'], 'end': 1050}],
 'enrolled': 7,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('Michael', 'Strand', 'mstrand@brandeis.edu'),
 'coinstructors': (),
 'code': ('SOC', '204A'),
 'subject': 'SOC',
 'coursenum': '204A',
 'name': 'Foundations of Sociological Theory',
 'independent_study': False,
 'term': '1203',
 'description': 'Studies classic theoretical texts that have been foundational for sociology. Particular attention is paid to works of Marx, Durkheim, and Weber. Identifies questions and perspectives from these theorists that continue to be relevant for sociological thinking and research. Usually offered every second year.\nLaura Miller or Michael Strand'}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

## PA 01: Probelm 5

### 5 a)
how many faculty taught COSI courses last year?

### 5 b)
what is the total number of students taking COSI courses last year?

In [6]:
# Implemented by Siyu Yang
x = 0
for c in courses:
    if(c['subject']=='COSI'):
        x+=c[ 'enrolled']
print(x)

2223


### 5 c)
what was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [7]:
# Implemented by Siyu Yang

import statistics
x = []
for c in courses:
    if(c['subject']=='COSI' and c['enrolled']>=10):
        x.append(c['enrolled'])
print(statistics.median(x))

37


### 5 d)
create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.

In [19]:
# Implemented by Yi-Zhe Hong
x=[]
for c in courses:
    tup = (c['enrolled'],c['subject'])
    x.append(tup)

x.sort(reverse = True)
print(x[0:10])
        

[(784, 'HWL'), (186, 'CHEM'), (186, 'BIOL'), (181, 'BIOL'), (180, 'CHEM'), (175, 'HS'), (170, 'PSYC'), (166, 'PSYC'), (166, 'COSI'), (150, 'COSI')]


### 5 e)
creates a list of tuples (S,N) where S is subject, N is the number of courses offered, prints the top 10 subjects in terms of number of courses offered

In [28]:
# Implemented by Emma Xu
UniqueSubjects = {course['subject'] for course in courses}  # a set of unique subjects
SN_list = []
for subject in UniqueSubjects:
    Allcourses = [course for course in courses if course['subject'] == subject]
    SN_tuple = (subject, len(Allcourses))
    SN_list.append(SN_tuple)

# print(SN_list)
# sort
SN_list.sort(key = lambda tup : tup[1], reverse = True) #sorts by tuplet's second item in descending order
print (SN_list[0:10])

[('BIOL', 613), ('HIST', 498), ('PSYC', 417), ('NEUR', 403), ('BCHM', 296), ('PHYS', 288), ('HS', 274), ('COSI', 272), ('MUS', 266), ('ENG', 265)]


### 5 f)
creates a list of tuples (S,F) where S is a subject and F is the number of faculty teaching the courses in that subject, 
prints the top 10 subjects in terms of number of faculty teaching in the subject

In [29]:
# Implemented by Emma Xu
UniqueSubjects = {course['subject'] for course in courses}  # a set of unique subjects
SF_list = []
for subject in UniqueSubjects:
    faculties = {course['instructor'][0] for course in courses if course['subject'] == subject} # a set of faculties in the subject
    SF_tuple = (subject, len(faculties)) 
    SF_list.append(SF_tuple)

# print(SF_list) 
# sort
SF_list.sort(key = lambda tup : tup[1], reverse = True)  #sorts by tuplet's second item in descending order
print (SF_list[0:10])

[('HS', 86), ('BIOL', 66), ('ECON', 51), ('BCHM', 48), ('HIST', 45), ('BCBP', 45), ('BUS', 44), ('HWL', 41), ('MATH', 37), ('NEJS', 35)]


### 5 g)
list the top 20 faculty in terms of number of students they taught

In [11]:
# Implemented by Tianjun Cai
from collections import defaultdict
total_f = defaultdict(int)
for course in courses:
    course_f = tuple(course['instructor'])
    total_f[course_f] += course['enrolled']
top_f = sorted(total_f.items(), key=lambda f:f[1], reverse=True)
print(top_f[:20])

[(('Leah', 'Berkenwald', 'leahb@brandeis.edu'), 926), (('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'), 583), (('Stephanie', 'Murray', 'murray@brandeis.edu'), 515), (('Milos', 'Dolnik', 'dolnik@brandeis.edu'), 489), (('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), 450), (('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu'), 439), (('Rachel V.E.', 'Woodruff', 'woodruff@brandeis.edu'), 422), (('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), 411), (('Daniel', 'Breen', 'dbreen91@brandeis.edu'), 375), (('Melissa', 'Kosinski-Collins', 'kosinski@brandeis.edu'), 365), (('Claudia', 'Novack', 'novack@brandeis.edu'), 355), (('Antonella', 'DiLillo', 'dilant@brandeis.edu'), 342), (('Jon', 'Chilingerian', 'chilinge@brandeis.edu'), 330), (('Ahmad', 'Namini', 'anamini@brandeis.edu'), 327), (('Iraklis', 'Tsekourakis', 'tsekourakis@brandeis.edu'), 316), (('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu'), 315), (('Peter', 'Mistark', 'pmistark@brandeis.edu'), 277), (('Brenda', 'Anderson', 'banders@brande

### 5 h)
list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)

In [12]:
# Implemented by Tianjun Cai
from collections import defaultdict
total_c = defaultdict(int)
for course in courses:
    course_key = (course['subject'], course['coursenum'])
    total_c[course_key] += course['enrolled']
top_c = sorted(total_c.items(), key=lambda c:c[1], reverse=True)
print(top_c[:20])


[(('HWL', '1'), 940), (('HWL', '1-PRE'), 879), (('BIOL', '14A'), 358), (('COSI', '10A'), 343), (('PSYC', '10A'), 336), (('BIOL', '15B'), 287), (('MATH', '10A'), 280), (('BIOL', '18B'), 274), (('BIOL', '18A'), 262), (('CHEM', '29A'), 245), (('CHEM', '29B'), 239), (('CHEM', '25A'), 236), (('PSYC', '51A'), 231), (('CHEM', '25B'), 226), (('COSI', '12B'), 225), (('BUS', '6A'), 215), (('CHEM', '18A'), 208), (('ECON', '10A'), 207), (('MATH', '15A'), 204), (('ANTH', '1A'), 201)]


### 5 i)

### Tianjun Cai

Question: I am curious about what classes in COSI department have less than 10 students enrolled

In [13]:
cs_less_than_10 = set()
for course in courses:
    if course['subject'] == 'COSI' and course['enrolled'] < 10:
        cs_less_than_10.add(course['code'])
print(cs_less_than_10)

{('COSI', '200A'), ('COSI', '241A'), ('COSI', '210A'), ('COSI', '300A'), ('COSI', '138A'), ('COSI', '200B'), ('COSI', '393G'), ('COSI', '98A'), ('COSI', '295A'), ('COSI', '299A'), ('COSI', '98B'), ('COSI', '93A'), ('COSI', '10A'), ('COSI', '97A'), ('COSI', '300B'), ('COSI', '45A'), ('COSI', '400D'), ('COSI', '293G'), ('COSI', '293B'), ('COSI', '99D'), ('COSI', '119A')}


### Emma Xu
Question: how many independent study classes are there in each subject, prints the top 10 tuplet

In [27]:
UniqueSubjects = {course['subject'] for course in courses} # a set of unique subjects
SIn_list = []
for subject in UniqueSubjects:
    indepdentStudy = [course for course in courses if (course['subject'] == subject and course['independent_study'] == True)]
    SIn_tuple = (subject, len(indepdentStudy)) 
    SIn_list.append(SIn_tuple)

# print(SIn_list) 
SIn_list.sort(key = lambda tup : tup[1], reverse = True) 
print (SIn_list[0:10])

[('BIOL', 473), ('HIST', 456), ('NEUR', 377), ('PSYC', 357), ('BCHM', 285), ('BCBP', 256), ('PHYS', 231), ('COSI', 226), ('ENG', 200), ('WGS', 198)]


### Siyu Yang
Question: How many people have been waitlisted by ECON depart(non-independent study courses) in total?

I'm asking this question is because I'm also an Econ major, and there are very limited ECON electives with small capacities.   So, the course selection process and enrollment are painful, and I'm curious how many people like me are always on the waitlist.

In [15]:
x = 0
for c in courses:
    if(c['subject']=='ECON' and c['independent_study']==False):
        x+=c[ 'waiting']

print(x) 


102


### Yi-Zhe Hong
Question: what is the average limit of cosi classes?

In [16]:
x=0
y=0
for c in courses:
    if (c['subject']=="COSI" and c['limit']!=None):
        x+=1
        y+=c['limit']
print(int(y/x))

68
