This project scrapes all the relevant information for the courses offered for Master's in CS at NUS and dumps it into a PDF.
There were number of web pages that needed to be traversed for going through all course information and stuff. I wanted all this info at once place so that I don't have to navigate between multiple chrome tabs and more imortantly I can share the info with my friends for discussion.
- Requests desired page -
https://mysoc.nus.edu.sg/~postgd/LecturePreview.html
that contain master list of courses that'll be offered this sem. - Parse html obtained from above and get intermediate IVLE links for each course -
Eg. -https://ivle.nus.edu.sg/lms/list_course.aspx?code=CS4211&acadyear=2017/2018
- Store all these intermediate links in a file -
ivle_links.txt
- Read
ivle_links.txt
and extract final url for each course that contains all the information. Eg. -https://ivle.nus.edu.sg/Module/Student/default.aspx?CourseID=BE4CE92A-0E2C-4160-91BB-F26912239937&ClickFrom=StuViewBtn
- Store all course final links to a new file -
final_links.txt
- Read all final links from
final_links.txt
- Extract the most sane parent div tag and all its relevant content
- Dump div tags of all courses to a master html file
Finally, convert master html file to pdf via pandoc -
$ pandoc NUS-SOC-Courses.html -o NUS-SOC-Courses.pdf
- Prettify pdf using pandoc -
- Use a latex template as used in geeks-pdf
- Incorporate TOC
- Start each course from new page
- Find a way to merge course information and respective slides - Should be doable..!!
- On the IVLE portal, information does not exist for some of the modules,check with seniors.
- If information for some course does not exist, fetch it from course catalogue? - Would be tough anyway.
- Code Improvements