Skip to content

harveydevereux/TimesRichList2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Times Rich List 2020 Python Scraper

This repo provides Python-Selenium code to scrape the data from the Times website.

Pretty hard coded so the data is included

Setup

The code requires a working python installation with Selenium installed (+ the driver for you browser)

Code works with the webpage on 30 May 2020, with Python 3.6.9, Selenium (Python) 3.141.0, and numpy 1.18.1 for the ceil function only

Running/Options

run with

python ScrapeData.py

or

./run.sh

Both support the options --csv [string] and --headless, the first takes the name of the csv you want to save the data as, and the sceond will launch Selenium without openning a browser (otherwise you'll watch the scraper in action)

Caveats

  • Sometimes the webpage bugs or takes to long to load and so Selenium does not find the "I Agree [to cookies]" button. This will show as selenium.common.exceptions.ElementNotInteractableException: Message: Element <button class="message-component message-button no-children"> could not be scrolled into view re-running usually works
  • If the webtext changes it will likely break

Data Analysis

Example Notebook to get started

The Wealth Distribution

alt text

Top 10 Sectors by Median Wealth

alt text

Top 10 Sectors By Total Wealth

alt text

About

Python-Selenium script to extract the data from the Sunday Times Website [https://www.thetimes.co.uk/sunday-times-rich-list]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published