It can parse the XML files provided on RPI's SIS system and provides a primitive object-oriented API layer to access the course information.
Also, it provides rudimentary method to computing schedules using constraints.
pip install RPICourses
Then you can import it to your python scripts:
from rpi_courses import CourseCatalog, list_sis_files
list_sis_files accepts an optional URL argument, the default url is
assumed to be "http://sis.rpi.edu/reg/". This function expects to
read an apache-style file listing page. It scrapes all the files listed
there that end with a xml file extension and returns the full URLs to those
>>> files = list_sis_files() >>> files [u'http://sis.rpi.edu/reg/rocs/201001.xml', u'http://sis.rpi.edu/reg/rocs/201005.xml', u'http://sis.rpi.edu/reg/rocs/201009.xml', u'http://sis.rpi.edu/reg/rocs/201101.xml']
CourseCatalog is the class that parses and stores all the course info.
But you don't even need to touch that at all! Just use the static methods:
>>> c = CourseCatalog.from_url(files[-1]) # wait a bit... and then you own the data, muhahaha!
CourseCatalog.from_stream to read from an file-handle-like
CourseCatalog.from_file to read from a given file path; and
CourseCatalog.from_xml_str to read from a string.
Each one of the static methods will instantiate a BeautifulStoneSoup instance behind the curtains an.
Now there are the following properties to access after parsing:
timestampis the raw timestamp (int) when the xml file was last updated.
datetimeis the python Datetime object of the raw timestamp.
yearthe year the xml course catalog refers to.
monththe starting month of the semester the course catalog refers to.
semesterthe english representation of the semester this catalog refers to.
namethe SEMESTER and YEAR as a string.
crosslistingsa dict mapping a CRN to it's associated CRNs. Related CRNs
refer to similar classes (ie - same class in different dept).
coursesa dict mapping a string of (name, dept, code) to a Course object.
With the most important being the Course object with contains:
namethe name of the course.
deptthe department code that provides this course.
numthe course number the department provides. (e.g. dept + num = "PHYS 2100")
credthe credit range the course provides.
sectionsa collection of sections this course has.
available_sectionsa collection of sections that are still availble (has seats).
There is also a
grade_type property, which seems unused. For sections have the following
crnthe CRN this section refers to.
seats_takenthe number of seats already taken.
seats_totalthe number of seats total for this section..
numthe section number.
periodsthe time periods this section meets.
notesa tuple of notes about the section.
Finally, the periods are instanced that have the following properties:
typethe type of session.
instructorthe instructor that teaches/oversees this period
startthe starting time of the period. Times are in military integer format (e.g. - 2100)
endthe ending time of the period. Times are in miliary integer format (e.g. - 2100)
locationthe location of the period.
int_daysthe collection of days (as integers) which the period meets. 0 is Monday.
But there are also additional computed properties which aid in analysis of a period:
time_rangereturns a tuple of (start, end) properties
tbareturns True if the course's time range is not yet specified (To Be Announced)
is_lecturereturns True if the period is a lecture time slot.
is_studioreturns True if the period is a studio time slot.
is_labreturns True if the period is a lab time slot.
is_testing_periodreturns True if the period is test-taking time slot.
is_recitationreturns True if the period is a recitation time slot.
daysreturns a tuple of english-friendly names of int_days.
Each object has a
conflicts_with method to determine of another like object has a time
Besides just fetching courses, you can also generate schedules:
from rpi_schedule import compute_schedules # get selected courses # returns a collection of dictionaries whose keys=courses and values=sections compute_schedules(courses)
Alternatively, a you can restrict the time ranges available to compute by passing in an iterable of TimeRange objects as the second argument:
from rpi_schedule import compute_schedules, TimeRange # ensure the schedule has 12pm - 2pm open. compute_schedules(courses, (TimeRange(start=1200, end=1400, days=(0,1)),))
Underneath the hood,
compute_schedule is a simple wrapper to the Scheduler object:
def compute_schedules(courses, excluded_times=(), free_sections_only=True, problem=None, return_generator=False): s = Scheduler(free_sections_only, problem) s.exclude_times(*tuple(excluded_times)) return s.find_schedules(courses, return_generator)
If possible, provide some way to access the course requirements for every major. This would require a massive undertaking.
The current course solver is a naive implementation. Optimize it by providing more detailed constraints and apply better solver algorithms.
The solver library has been moved to a `separate library`_.
There are a lot of data that can be extracted from the section.notes attribute that could be placed into their own properties for easier & abstracted access.
Ideally, the notes that are properly analyzed should be removed from the notes property.
Prerequisites and Co-requisites
Automatically detect & lookup course prereqs, which are detailed in section NOTES:
<NOTE>PRE-REQ: PHYS 23330 & MATH 4600</NOTE> <NOTE>PRE-REQ: PHYS 1100 OR 1150 AND PHYS 1200 OR 1250</NOTE> <NOTE>PRE-REQ: PHYS 2110 OR PHYS 2510, AND MATH 2010 AND MATH 2400</NOTE> <NOTE>PRE-REQ: STSS 2300 OR PERMISSION OF INSTRUCTOR</NOTE> <NOTE>PRE-REQ: PHYS 2330 AND MATH 4600</NOTE> <NOTE>PRE-REQ: BIOL 4620 AND BIOL 4760 OR CHEM 4760 OR BCBP 4760</NOTE> <NOTE>PRE-REQ: CSCI 2300</NOTE> <NOTE>PRE-REQ; INTRODUCTORY COMM OR SOCIAL SCIENCE COURSE</NOTE> <NOTE>PRE-REQ: ENGR 2250 & ENGR 2530 & CIVL 2670</NOTE> <NOTE>PRE-REQ: ANY FILM COURSE OR PERMISION OF INSTRUCTOR</NOTE> <NOTE>PRE-REQ: MANE 4480</NOTE>
This text needs to be parsed, but relaxed enough to gloss over inconsistencies of the text provided. The API should provide properties in section and course objects to see course prereqs, the course object needs to filter out duplicates that the collective sections would provide.
Prereqs can span multiple lines:
<NOTE>PRE-REQ: PHYS 1100 OR 1150 AND PHYS 1200 OR 1250</NOTE> <NOTE>AND CSCI 1100</NOTE>
Co-requisites are similar to prereqs:
There are a few prereqs that is different:
<NOTE>ANY 1000/2000 LEVEL WRITING COURSE</NOTE> <NOTE>PRE-REQ: ANY WRIT COURSE</NOTE> <NOTE>PRE-REQ: ONE WRIT OR COMM COURSE</NOTE><NOTE>OR ONE COMM INTENSIVE COURSE</NOTE>
"Meets with" Courses
Automatically detect and lookup when a course/section meets with X course:
<NOTE>MEETS WITH PHIL 2961 / COGS 2960</NOTE> <NOTE>MEETS WITH ERTH 4690/ ENVE 4110</NOTE> <NOTE>MEETS WITH PSYC 4967</NOTE> <NOTE>MEETS WITH COGS 4960 & CSCI 4969</NOTE> <NOTE>MEETS WITH ARTS 4010 & ITWS 4961</NOTE> <NOTE>MEETS WITH BIOL 4710/01</NOTE> <NOTE>MEETS WITH BIOL 4770, CHEM 4770</NOTE> <NOTE>MEETS WITH BCBP 4640 & BIOL 4640/6640</NOTE> <NOTE>MEETS WITH MANE 4750/6830</NOTE> <NOTE>MEETS WITH CSCI 6960, ITWS 4962/6961, ERTH 4963/6963</NOTE> <NOTE>MEETS WITHENGR 4100, ITWS 4300/6300</NOTE>
The good part is that this format is pretty consistent, but harder to parse out the specific course(s). Since more than one course can be listed, the API should support more than one course. A section number may be provided to the other course it meets with.
Certain courses/sections are noted to fulfill a given requirement:
<NOTE>COMMUNICATION INTENSIVE</NOTE> <NOTE>FULFILLS COMM INTENSIVE REQUIREMENT</NOTE> <NOTE>COMM INTENSIVE</NOTE> <NOTE>FULFILLS EMAC THESIS</NOTE> <NOTE>FULFILLS EMAC THESIS REQUIREMENT</NOTE>
This is simple to check, as all are marked identically. There may be an edge case where some (but not all) sections contain this note -- then is a course communication intensive?
Some courses are restricted by majors or other requirements:
<NOTE>RESTRICTED TO ARCH MAJORS</NOTE> <NOTE>OPEN TO ALL MAJORS EXCEPT ARCH</NOTE> <NOTE>RESTRICTED TO EART, EMAC, GSAS MAJORS, OTHERS 12/13</NOTE> <NOTE>RESTRICTED TO EMAC, COMM, COMM-IT MAJORS</NOTE> <NOTE>RESTRICTED TO SENIOR CHME MAJORS</NOTE> <NOTE>RESTRICTED TO HCIN,TCOM, CMRT, ITWS MAJORS</NOTE> <NOTE>RESTRICTED TO IT MAJORS</NOTE> <NOTE>RESTRICTED TO COMM, ITWS & EMAC MAJORS</NOTE> <NOTE>RESTRICTED TO EMAC, EART AND ITWS MAJORS</NOTE>
These are varied and hard to analysis without matching the direct strings. Be careful that some sections make it multi-lined:
<NOTE>RESTRICTED TO EMAC, COMM, COMM-IT, EART,GSAS</NOTE> <NOTE>DSIS, IT-ARTS MAJORS</NOTE>
Certain sections are available only to particular students in a particular region:
<NOTE>INDIA STUDENTS ONLY</NOTE> <NOTE>NYC STUDENTS ONLY</NOTE> <NOTE>NEW YORK CITY STUDENTS ONLY</NOTE> <NOTE>INDIA ARCH STUDENTS ONLY</NOTE> <NOTE>INDIA PROGRAM STUDENTS ONLY</NOTE> <NOTE>RESTRICTED TO BIAM STUDENTS</NOTE>
The ones listed above are the only ones found at the time of writing (Jul 25, 2011), but auto-detecting others kinds shouldn't be too hard to do. There seems to be only one of these per section/course -- so a property that defines this only restriction will suffice.
Wait-Listed / By Permission
This course is waitlisted and requires contacting a designated person:
<NOTE>CONTACT E. LARGE (LARGEE@RPI.EDU) TO BE PUT ON WAITLIST</NOTE> <NOTE>CONTACT ELIZABETH LARGE (LARGEE@RPI.EDU) TO BE ADDED</NOTE><NOTE>TO WAIT LIST</NOTE>
This seems pretty rare -- should wait listed be implemented? Alternatively a course may be blocked until approved by some process:
<NOTE>BY AUDITION ONLY</NOTE> <NOTE>ENROLLMENT BY PERMISSION OF INSTRUCTOR</NOTE> <NOTE>PERMISSION OF INSTRUCTOR REQUIRED</NOTE> <NOTE>PERMISSION OF INSTRUCTOR</NOTE>
Remember that the permission of instructor may also be in a PRE-REQS note:
<NOTE>PRE-REQ; INTRODUCTORY COMM OR SOCIAL SCIENCE COURSE</NOTE><NOTE>OR PERMISSION OF INSTRUCTOR</NOTE>
Some courses are available on a particular semester:
<NOTE>COURSE TAUGHT 2ND HALF OF SEMESTER</NOTE> <NOTE>COURSE TAUGHT SECOND HALF OF SEMESTER</NOTE> <NOTE>COURSE TAUGHT FIRST HALF OF SEMESTER</NOTE>
This are listed under the COMMUNICATION INTERNSHIP course:
<NOTE>ORGANIZATIONAL MEETING WED 1/26</NOTE> <NOTE>ORGANIZATIONAL MEETING WED. 1/26/11</NOTE>
This is listed under the INTERFACE DESIGN course:
<NOTE>KNOWLEDGE OF AUTHORING SOFTWARE FOR</NOTE> <NOTE>MULTIMEDIA OR WEB DEVELOPMENT</NOTE>
<NOTE>KNOWLEDGE OF INTERACTICE AUTHORING SOFTWARE</NOTE>
Under HUMAN-MEDIA INTERACTION:
<NOTE>COUNTS AS ADVANCED HCI TOPICS</NOTE>
What's the point of a location for this section? This is noted in COMPUTER AIDED MACHINE II:
<NOTE>COURSE WILL MEET IN JEC 2323</NOTE>
I don't know what this means (noted under INVENTOR'S STUDIO; SOLAR DEV. & ENERGY RENEW.; and RADIOLOGICAL ENGINEERING):
<NOTE>SENIOR STANDING</NOTE> <NOTE>JUNIOR OR SENIOR STANDING</NOTE> <NOTE>THIS SECTION RESTRICTED TO SENIORS</NOTE> <NOTE>ANY 1000/2000 LEVEL WRITING COURSE</NOTE><NOTE>OR JR/SR STATUS</NOTE>
Course Continuation? Filed under SENIOR DESIGN PROJECT:
<NOTE>CONTINUATION OF MANE 4380</NOTE>
Past grade requirements (SPACECRAFT ATTITUDE DYNAMICS):
<NOTE>STUDENTS SHOULD HAVE EARNED AT LEAST A 'B+' IN</NOTE> <NOTE>MANE 4100 AND MANE 4170 TO REGISTER FOR THIS COURSE</NOTE>
Specific dates the course meets (GLOBAL BUSINESS & SOCIAL RESPO):
<NOTE>CLASS MEETS ON 12/13, 12/10, 1/3, 1/10 & 1/17</NOTE>