This repo provides Python-Selenium code to scrape the data from the Times website.
Pretty hard coded so the data is included
The code requires a working python installation with Selenium installed (+ the driver for you browser)
Code works with the webpage on 30 May 2020, with Python 3.6.9, Selenium (Python) 3.141.0, and numpy 1.18.1 for the ceil function only
run with
python ScrapeData.py
or
./run.sh
Both support the options --csv [string]
and --headless
, the first takes the name of the csv you
want to save the data as, and the sceond will launch Selenium without openning a browser (otherwise you'll watch the
scraper in action)
- Sometimes the webpage bugs or takes to long to load and so Selenium does not find the "I Agree [to cookies]" button. This will show as
selenium.common.exceptions.ElementNotInteractableException: Message: Element <button class="message-component message-button no-children"> could not be scrolled into view
re-running usually works - If the webtext changes it will likely break
Example Notebook to get started