# MIT EECS Portal Scraper Test

This notebook tests the `eecs_portal` module that scrapes the MIT EECS "who is teaching what" page.

## Test 1: Fetch Current Semester

Let's first test fetching the current semester without specifying a semester parameter:

In [1]:
from eecs import get_who_is_teaching_what

# Fetch current semester (without specifying)
df_current, semester_current = get_who_is_teaching_what()

print(f"Detected Semester: {semester_current}")
print(f"Number of courses: {len(df_current)}")
print(f"\nDataFrame columns: {df_current.columns.tolist()}")
print(f"\nFirst 10 courses:")
df_current.head(10)

Detected Semester: Fall 2025
Number of courses: 105

DataFrame columns: ['Area', 'Course', 'Title', 'Lecturers', 'TAs']

First 10 courses:


Unnamed: 0,Area,Course,Title,Lecturers,TAs
0,CS,6.1000/A/B[6.0001+2],Introduction to Programming and Computer Science,Andrew Wang(6.1000)Ana Bell(6.100A)John V. Gut...,
1,CS,6.1010[6.009],Fundamentals of Programming,Max Goldman,Hope DarganRobert C. MillerBruce Tidor
2,CS,6.1040[6.170],Software Studio,Daniel N. JacksonMitchell Gordon,
3,CS,6.1120[6.818],Dynamic Computer Language Engineering,Michael J. Carbin,
4,CS-AID,6.1200J[6.042],Mathematics for Computer Science,Zachary R. AbelErik D. DemaineRonitt Rubinfeld,
5,CS-AID,6.1210[6.006],Introduction to Algorithms,Brynmor ChapmanHenry Corrigan-GibbsSrinivas De...,
6,CS-AID,6.1220J[6.046],Design and Analysis of Algorithms,Srinivasan RaghuramanCharles E. LeisersonVirgi...,
7,CS,6.1810[6.039],Operating Systems Engineering,M. Frans KaashoekRobert T. MorrisNickolai B. Z...,
8,CS,6.1850/2[6.052],Computer Systems & Society,Katrina L. LaCurts,
9,CS-EE,6.1910[6.004],Computation Structures,Silvina Hanono WachmanMengjia Yan,


## Test 2: Fetch Specific Semester (Spring 2026)

Now let's test fetching a specific semester:

In [2]:
SEMESTER = 'Spring 2026'
df_spring, semester_spring = get_who_is_teaching_what(SEMESTER)

print(f"Requested Semester: {SEMESTER}")
print(f"Returned Semester: {semester_spring}")
print(f"Number of courses: {len(df_spring)}")
print(f"\nDataFrame columns: {df_spring.columns.tolist()}")
print(f"\nFirst 10 courses:")
df_spring.head(10)

Requested Semester: Spring 2026
Returned Semester: Spring 2026
Number of courses: 98

DataFrame columns: ['Area', 'Course', 'Title', 'Lecturers', 'TAs']

First 10 courses:


Unnamed: 0,Area,Course,Title,Lecturers,TAs
0,CS,6.1000/A/B[6.0001+2],Introduction to Programming and Computer Science,Ana BellJohn V. GuttagMina Konakovic LukovicAn...,
1,CS,6.1010[6.009],Fundamentals of Programming,Adam Hartz,Michael J. CarbinHope Dargan
2,CS,6.1020[6.031],Software Construction,Max GoldmanMitchell GordonRobert C. Miller,
3,CS,6.1060[6.172],Software Performance Engineering,Charles E. LeisersonSaman P. AmarasingheNir N....,
4,CS-AID,6.1200J[6.042],Mathematics for Computer Science,Zachary R. AbelJonathan KelnerAnand Natarajan,
5,EE-CS,6.120A[6.042A],Discrete Mathematics and Proofs for Computer S...,Muriel Medard,
6,CS-AID,6.1210[6.006],Introduction to Algorithms,Brynmor ChapmanYael Tauman KalaiWill Leiserson,
7,CS-AID,6.1220J[6.046],Design and Analysis of Algorithms,Srinivasan RaghuramanKuikui LiuJulian Shun,
8,CS,6.1400J[6.045],Automata Comput & Complexity,Dor Minzer,
9,CS,6.1800[6.033],Computer System Engineering,Katrina L. LaCurtsTim Kraska,


In [3]:
# Test the get_aags() function
from eecs import get_aags
aags_classes = get_aags()
print(f"Found {len(aags_classes)} AAGS classes")
print("\nFirst 10 AAGS classes:")
for i, course in enumerate(aags_classes[:10], 1):
    print(f"{i}. {course}")

Found 163 AAGS classes

First 10 AAGS classes:
1. 16.420
2. 18.435
3. 2.111
4. 6.1852
5. 6.2092
6. 6.2222
7. 6.2532
8. 6.3102
9. 6.3702
10. 6.3722


## Test 3: Course Number Parser

Now let's test the course number parser that handles the transition from old 3-digit to new 4-digit EECS numbering:

In [None]:
# Test the course number parser
import importlib
import eecs_course_parser
importlib.reload(eecs_course_parser)
from eecs_course_parser import parse_course_number, is_new_format, normalize_course_number

# Test cases covering various formats
test_cases = [
    # Combo format: new[old]
    "6.1220J[6.046]",
    "6.036[6.036]",  # Same old and new
    "6.1000/A/B[6.0001+2]",  # Multiple subjects with old numbers

    # Simple new format
    "6.0001",
    "6.3450",
    "6.1220J",

    # Lettered subjects (unchanged)
    "6.UAR",
    "6.UAT",
    "6.THM",

    # Multiple subjects expansion
    "6.1000/A/B",
    "6.3450/A",
    "6.2000/A/B/C",
]

print("EECS Course Number Parser - Test Results")
print("=" * 60)

for i, test_case in enumerate(test_cases, 1):
    try:
        result = parse_course_number(test_case)
        is_new = is_new_format(result[0])  # result is always a list

        print(f"{i:2d}. Input:  '{test_case}'")
        print(f"    Output: {result}")
        print(f"    New format: {is_new}")
        print()

    except Exception as e:
        print(f"{i:2d}. Input:  '{test_case}'")
        print(f"    ERROR: {e}")
        print()

# Test normalization function
print("Normalization Tests:")
print("-" * 30)
norm_tests = ["6.1220J[6.046]", "6.1000/A/B[6.0001+2]"]
for test in norm_tests:
    try:
        normalized = normalize_course_number(test)
        print(f"'{test}' → {normalized}")
    except Exception as e:
        print(f"'{test}' → ERROR: {e}")

EECS Course Number Parser - Test Results
 1. Input:  '6.1220J[6.046]'
    Output: ['6.1220J']
    New format: True

 2. Input:  '6.036[6.036]'
    Output: ['6.036']
    New format: False

 3. Input:  '6.1000/A/B[6.0001+2]'
    Output: ['6.1000', '6.100A', '6.100B']
    New format: True

 4. Input:  '6.0001'
    Output: ['6.0001']
    New format: True

 5. Input:  '6.3450'
    Output: ['6.3450']
    New format: True

 6. Input:  '6.1220J'
    Output: ['6.1220J']
    New format: True

 7. Input:  '6.UAR'
    Output: ['6.UAR']
    New format: True

 8. Input:  '6.UAT'
    Output: ['6.UAT']
    New format: True

 9. Input:  '6.THM'
    Output: ['6.THM']
    New format: True

10. Input:  '6.1000/A/B'
    Output: ['6.1000', '6.100A', '6.100B']
    New format: True

11. Input:  '6.3450/A'
    Output: ['6.3450', '6.345A']
    New format: True

12. Input:  '6.2000/A/B/C'
    Output: ['6.2000', '6.200A', '6.200B', '6.200C']
    New format: True

Normalization Tests:
-----------------------------