Course Catalog Parser

Using Beautiful Soup to scrape the courses from Bard College course catalogs into JSON files.

At the moment, the parser program app.py takes around 30 seconds to parse through a semester's worth of courses. Across semesters, the layout of Bard's course catalog is very inconsistent. This has made it necessary to distinguish course catalogs which separate dates and times within tables from those which do not, as well as those course catalogs which use the two-character distributions naming system (which evidently started in Fall 2016) from those which use the four-character system.

This project was partially inspired by @sabo's Bard People's Insurrectionary Course Catalog.

P.S. Sorry for my ridiculous list comprehensions.

TODO

There's currently an issue with inconsistencies in the layout of tables within individual course list pages.
- For example, see the mathematics department course list for Spring 2018.
- Because the parser is not designed to automatically distinguish between course list pages which do and do not feature both the new and old distributions naming systems, pages such as this one can not be parsed.
- To fix this, it should be possible to programmatically count the number of columns in each table for each course in each course list page, thus informing the parser on which courses use both or neither of the distributions naming systems.

Related Repositories:

This was intended as a Bard College Senior Project, but I decided to make this instead.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
app.py		app.py
bard_courses.json		bard_courses.json
urls_new_distributions.txt		urls_new_distributions.txt
urls_new_distributions.txt_output.json		urls_new_distributions.txt_output.json
urls_old_distributions.txt		urls_old_distributions.txt
urls_old_distributions.txt_output.json		urls_old_distributions.txt_output.json
urls_spring_2016.txt		urls_spring_2016.txt
urls_spring_2016.txt_output.json		urls_spring_2016.txt_output.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course Catalog Parser

TODO

Related Repositories:

About

Releases

Packages

Languages

segalgouldn/course-catalog-parser

Folders and files

Latest commit

History

Repository files navigation

Course Catalog Parser

TODO

Related Repositories:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages