Ofsted Report Scraper
Download and inspect Ofsted reports for keywords. This code will:
- Download a list of schools (
- Download a list of reports associated with those schools (
- Download a subset of those reports (
- Convert .pdf reports to .txt (
- Parse .txt for keywords using regular expressions (
git clone https://github.com/jdkram/ofsted-report-scraper
gem install bundler
task.rb- specify school types, reports types etc.
- Run with
caffeinate ruby task.rbto keep machine awake for long downloads).
Please note that
scrape_school_pages don't currently handle being interrupted well as they don't record their progress.
sleep rand(0.1..0.6) (a random time between 0.1 and 0.6 seconds) between calls to ease the request rate on their site.
download_report_pdfs sleeps for a slightly longer 1-2 seconds, for no particular reason other than this tends to be a large number of consecutive requests.