# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [5]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [5]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [6]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [7]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

# a) How many faculty taught COSI courses last year?

In [65]:
faculty = len({c['instructor'] for c in courses if c['subject'] == 'COSI'})
num = print(f"There were {faculty} faculties teaching COSI courses last year.")

There are 27 faculty taught COSI courses last year


# b) what is the total number of students taking COSI courses last year?

In [71]:
# {c['enrolled'] for c in courses}
enroll = sum(c['enrolled'] for c in courses if c['subject'] == 'COSI')
num = print(f"There were {enroll} students taking COSI courses last year.")

There are 2223 students taking COSI courses last year


# c) what was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [97]:
# size = {c for c in courses if c['subject'] == 'COSI' and c['enrolled'] >=10}
import statistics
student_size = {c['enrolled'] for c in courses if c['subject'] == 'COSI' and c['enrolled'] >=10}
median = statistics.median(student_size)
num = print(f"The median size of a COSI course last year was {median} students.")

The median size of a COSI course last year is 45.5 students.


# d) create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.


In [146]:
# temp = set()
# for c in courses:
#     temp.add((c['subject'], c['enrolled']))
    
temp = []
all_course_size = {(c['subject'], int(c['enrolled'])) for c in courses}
for i in range(len(all_course_size)):
    temp.append(all_course_size.pop())
    
# print(temp)
temp.sort(key=lambda x:x[1])
print(temp[-10:])

[('COSI', 150), ('PSYC', 166), ('COSI', 166), ('PSYC', 170), ('HS', 175), ('CHEM', 180), ('BIOL', 181), ('CHEM', 186), ('BIOL', 186), ('HWL', 784)]


# e) do the same as in (d) but print the top 10 subjects in terms of number of courses offered


# f) do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject


# g) list the top 20 faculty in terms of number of students they taught


In [32]:
instructors = {}
for course in courses:
    instructor = course['instructor'][0] + ' ' + course['instructor'][1]
    if instructor not in instructors:
        instructors[instructor] = []
    else:
        instructors[instructor].append(course['enrolled'])

for instructor in instructors:
    instructors[instructor] = sum(instructors[instructor])
    
sort_instructors = sorted(instructors.items(), key=lambda x: x[1], reverse=True)
[i[0] for i in sort_instructors[:20]]

['Leah Berkenwald',
 'Kene Nathan Piasta',
 'Milos Dolnik',
 'Stephanie Murray',
 'Timothy J Hickey',
 'Rachel V.E. Woodruff',
 'Daniel Breen',
 'Bryan Ingoglia',
 'Melissa Kosinski-Collins',
 'Claudia Novack',
 'Antonella DiLillo',
 'Jon Chilingerian',
 'Ahmad Namini',
 'Brenda Anderson',
 'Maria de Boef Miara',
 'Colleen Hitchcock',
 'Scott A. Redenius',
 'Seth Fraden',
 'Iraklis Tsekourakis',
 'Teresa Vann Mitchell']

# h) list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)


In [44]:
course_enrollments = {}
for course in courses:
    course_name = course['subject'] + ' ' + course['coursenum'] + ' ' + course['name']
    if course_name not in course_enrollments:
        course_enrollments[course_name] = []
    else:
        course_enrollments[course_name].append(course['enrolled'])

for course in course_enrollments:
    course_enrollments[course] = sum(course_enrollments[course])
    
sort_course_enrollments = sorted(course_enrollments.items(), key=lambda x: x[1], reverse=True)
[i[0] for i in sort_course_enrollments[:20]]

['HWL 1 Navigating Health and Safety',
 'HWL 1-PRE Introduction to Navigating Health and Safety',
 'BIOL 14A Genetics and Genomics',
 'MATH 10A Techniques of Calculus (a)',
 'BIOL 18B General Biology Laboratory',
 'BIOL 18A General Biology Laboratory',
 'COSI 10A Introduction to Problem Solving in Python',
 'CHEM 29A Organic Chemistry Laboratory I',
 'COSI 12B Advanced Programming Techniques in Java',
 'BUS 6A Financial Accounting',
 'CHEM 29B Organic Chemistry Laboratory II',
 'PSYC 10A Introduction to Psychology',
 'BUS 10A Business Fundamentals',
 'MATH 15A Applied Linear Algebra',
 'EL 35A Navigating STEM',
 'ECON 80A Microeconomic Theory',
 'ECON 10A Introduction to Microeconomics',
 'CHEM 11B General Chemistry II',
 'CHEM 18B General Chemistry Laboratory II',
 'CHEM 18A General Chemistry Laboratory I']

# i) Create your own interesting question (each team member creates their own) and use Python to answer that question.

Alicia Sheng

Question: What are the top 20 courses with the highest enrollment rate (enrolled / limit)?

In [67]:
enrollment_rate = [(course['subject'] + ' ' + course['coursenum'] + ' ' + course['name'], course['enrolled'] / course['limit']) for course in courses if course['limit'] != None and course['limit'] != 0]
enrollment_rate.sort(key=lambda x: x[1], reverse=True)
[i[0] for i in enrollment_rate[:20]]

['FIN 217F Corporate Financial Modeling',
 'BUS 6A Financial Accounting',
 'NBIO 140B Principles of Neuroscience',
 'CAST 125A Confronting Gender-Based Violence',
 'AAAS/WGS 152B Beyoncé and Beyond: The Politics of Black Popular Music',
 'BIOL 159A Project Laboratory in Microbiology',
 'BIOL 101A Molecular Biotechnology',
 'HWL 42 The Art of Resilience: Strategies to Thrive in a Stressful World',
 'BUS 256A Marketing Analytics',
 'CHEM 69A Advanced Laboratory: Materials Chemistry',
 'HWL 48 Cardio Workout',
 'FIN 232A Mergers and Acquisitions Analysis',
 'BUS 276A Business Dynamics: Managing in a Complex World',
 'HISP 155A Wall Power: Muralism and Resistance in (Latin) American Art',
 'HS 251B Managerial Accounting',
 'CHEM 69A Advanced Laboratory: Materials Chemistry',
 'HISP 111B Introduction to Latin American Literature and Culture',
 'FA 197A Studies in Asian Art',
 'SOC 46B Geographies of Inequality: Exploring Power and Space in the United States',
 'CHEM 29A Organic Chemistry La

Francisco Liu

Question: 

Gordon Dou

Question: 

Michael Li

Question: 