Harvard Gems

Web scrapper, data analysis and website for the best classes at Harvard.

If you found it useful, you can

Further analytics

Course ratings correlate well with recommendation score.

Course ratings also correlate well with lecturer scores, but with more scatter.

Sentiment analysis on the course comments also agree well with its average course rating.

Most high-scoring courses have low workload.

Harvard classes tend to have high ratings. It is rare to get a low score.

Most Harvard classes have a workload demand of around 5 hours per week outside of classes, though the distribution is skewed so some classes have much higher workloads.

There is little correlation between the number of students in the class and the score of the class.

More analysis, and the code for the graphs can be found through this Colab Notebook. A copy of the notebook is also available in the repo above as course_ratings_analysis.ipynb. Remember to upload verbose_course_ratings.csv if you hope to tinker around.

Website

The code for the website can be found at this repo. This repo is for the scrapping and analytics.

Installation

pip install -r requirements.txt

Usage

You probably don't need to follow the steps below since the results can be found at verbose_course_ratings.csv, but this is a step-by-step guide on how to create that csv from scratch.

Download the webpage from the link in scrapper.py as a HTML file namedQReports.html. Run scrapper.py to scrape the links for the QGuides for each course. The links generated will be stored at courses.csv.
Visit any QGuide links scrapped at courses.csv to get the cookies (see the code of downloader.py for the search term secret_cookie) and paste it at a new file named secret_cookie.txt. Run downloader.py to download all the QGuides with the links scrapped from the previous step. The QGuides will be stored at the folder QGuides.
Run analyzer.py to generate course_ratings.csv.
Now we have to add details like divisional requirement or whether it fulfils quantitative reasoning with data (QRD), but most importantly we need to know whether this class is offered in Fall 2024 (the QGuides are for Fall 2023). Run myharvarddriver.py to use Selenium to get these necessary details from my.harvard.edu. Depending on your machine, you might need more setup to use Selenium, so you can check out the official guide. The webpages for each class will be stored as HTML files at the folder myharvard. This should be the step that takes the longest (around 1.5 hours), I usually leave it running overnight.
New in 2024 Fall, some classes have sections to be chosen during registration, like CHNSE 130 and EXPOS 40. Run rescrape.py to handle these cases which require an additional click.
Process these webpages to get the data by running append_details.py. This will generate verbose_course_ratings.csv as required.
Start a Jupyter notebook session (jupyter notebook) and choose course_ratings_analysis.ipynb to run. This will generate the graphs above and the data at output_data. Follow through the notebook and play around!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
archive		archive
departments		departments
myharvard		myharvard
output_data		output_data
readme-images		readme-images
.gitignore		.gitignore
LICENSE.md		LICENSE.md
QReports.html		QReports.html
README.md		README.md
analyzer.py		analyzer.py
append_details.py		append_details.py
course_ratings.csv		course_ratings.csv
course_ratings_analysis.ipynb		course_ratings_analysis.ipynb
courses.csv		courses.csv
downloader.py		downloader.py
fixfilenames.py		fixfilenames.py
further_analysis.py		further_analysis.py
gem_sentences.txt		gem_sentences.txt
my.harvard.html		my.harvard.html
myharvarddriver.py		myharvarddriver.py
not-offered.txt		not-offered.txt
requirements.txt		requirements.txt
rescrape.py		rescrape.py
scrapper.py		scrapper.py
verbose_course_ratings.csv		verbose_course_ratings.csv
verbose_course_ratings.xlsx		verbose_course_ratings.xlsx

License

jeqcho/harvard_gem_finder

Folders and files

Latest commit

History

Repository files navigation

Harvard Gems

Further analytics

Website

Installation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages