## Monash Handbook and Prequisites Documentation Project




### Determining prerequisites

I've written a script to scrape prerequisites from `https://mscv.apps.monash.edu` which validates a course map. Since it gives us prerequisites if we're missing them, we can simply submit up to 125 of those requests in a go. That being said, 
we still need a way to determine corequisites and prohibitions:
- As far as prohibitions are concerned, we can scrape from the handbook, then do a simple [A-Z]^3[0-9]^4 regex check on the prohibitions section. The API will return all the prohibited units.
- Corequisites are more tricky, however we should be able to make simplifications to our process. For instance, we can regex check the coreq section and then not include them on second pass for units that have a corequisite.

We now explore the data source. It takes 13.29 seconds on my PC to run the script below, retrieving deps for 5200 units.

In [2]:
!python src/retrieve_requisites.py

12.097641468048096


Yields a file named `unit_reqs_clean.json`. I've preprocessed and removed a fair bit of the data the server sends as it's meaningless, however we still need to inspect the data.

In [1]:
import json

In [2]:
with open("unit_reqs_clean.json", "r") as file:
    unit_reqs = json.load(file)

For any given unit, it can have a list of prerequisites, corequisites, prohibitions and any other requirements. There are 8 different messages that you may get for unit enrollment:

In [3]:
unique_messages = set([item['title'] for sublist in unit_reqs.values() for item in sublist])
for msg in unique_messages:
    print(msg)

Missing corequisites
Have not enrolled in a unit
Have not passed enough units
Not enough passed credit points
Not enough enrolled credit points
Permission is required for this unit
Have not completed enough units
Prohibited unit


So let's go through them:

- Prohibited unit: You've enrolled/completed in a unit that prevents you from taking the current unit. For example, MTH1030 and ENG1005 both prohibit each other. A thing to note is that you will be given the names of all the prohibited units with that, e.g enrolling in ENG1005 and MTH1030 will also tell you MTH1035 is prohibited

- Have not enrolled in a unit: This one is unusual, as it only appears for 12 units. It may say to enrol in a list of units, however it really means to have done it as a prerequisite. EAE2522 is one such example. It has a different format to the below formats.

- Have not completed enough units: Again, this only appears for 3 units, all of which have the prefix APG. This seems to just be a completion requirement.

- Have not passed enough units: This is the normal message if you lack the prerequisites for a unit. Appears in most places.

- Not enough passed credit points: Some units simply require `x` credit points before you can enrol in them. Some mandate `y` credit points from faculty `z`. This appears less often but there are 360 occurences.

- Not enough enrolled credit points: Only appears once, but seems to be similar to the above, EDF5019

- Missing corequisites: Corequisites are a special sort of prerequisite that can be taken either before you do a unit, or concurrently with the unit. For instance, ENG1014 has a corequisite for ENG1005.

- Permission is required for this unit: You need to contact someone in order to enrol in this unit. Fairly standard.

We may inspect the unique requirements, they need to be processed and the number along with the units are to be returned.

In [12]:
unique_requirements =  sorted(set([item['description'][0:20] for sublist in unit_reqs.values() for item in sublist]))

for requirement in unique_requirements:
    print(requirement)

Please enrol in 1 of
Please enrol in 10 o
Please enrol in 11 o
Please enrol in 12 o
Please enrol in 2 of
Please enrol in 3 of
Please enrol in 4 of
Please enrol in 5 of
Please enrol in 6 of
Please enrol in 7 of
Please enrol in 8 of
Please enrol in 9 of
Please enrol in AMU4
Please enrol in APG5
Please enrol in CDS2
Please enrol in EAE1
Please enrol in EAE2
Please enrol in FIT4
Please enrol in OHS1
Please enrol in PSY2
Please enrol in SDN2
You have already com
You have already enr
You must enrol in 72
You must pass 12 mor
You must pass 120 mo
You must pass 144 mo
You must pass 18 mor
You must pass 2 more
You must pass 24 mor
You must pass 30 mor
You must pass 36 mor
You must pass 42 mor
You must pass 48 mor
You must pass 6 more
You must pass 60 mor
You must pass 72 mor
You must pass 84 mor
You must pass 90 mor
You must pass 96 mor
You will need permis


The goal is to then extract all 3 categories and then place it into a prerequisites, corequisites and prohibitions data structure. Below is a prototype for processing all $8$ rules along with a potential data type:


In [None]:
class Requisites:

    prerequisites: list[dict[str]]  # [{'NumReq':int, units:list[str]}, ...]
    permissionRequired: bool 
    prohibitions: list[str] # [MTH1020, PHS1030...]
    corequisites: list[str] # Same as above
    creditPoints: int # 0 by default, 24 for MTH2132 and other special units
    

In [None]:
unit_requisites = {unit:[] for unit in unit_reqs}

In [80]:
for unit in unit_requisites:

    for unit_rule in unit_reqs[unit]: # Go over each rule
        match unit_rule['title']:
            case "Prohibited unit":
                pass
            case "Have not enrolled in a unit":
                pass
            case "Have not completed enough units":
                pass
            case "Have not passed enough units": # only implement this for now
                _, units = unit_rule['description'].split(":")
                unit_requisites[unit].extend(units.strip().replace(" or",",").split(", "))
            case "Not enough passed credit points":
                pass
            case "Not enough enrolled credit points":
                pass 
            case "Missing corequisites":
                pass
            case "Permission is required for this unit":
                pass