Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 2.09 KB

README.md

File metadata and controls

9 lines (8 loc) · 2.09 KB

Scraping political speeches

This is a repository of scripts and data required to replicate various datasets of political speeches, including:

  • Hansard in full: Parliamentary speeches matched to individual MPs with party and biographic information, using the TheyWorkForYou database.
  • Hansard PMQs: maiden speeches given by new British MPs from 1945 onwards, using debate transcripts scraped from Hansard - includes speech content, date.
  • Conference speeches: leaders' speeches at party conferences, available at BritishPoliticalSpeech.org, scraped into a dataframe and cleaned - includes speech content, politicians' names, party, year, location, commentary on the speech, and tags.
  • Manifesto forewords: uses the Manifesto Project archive to extract forewords to British election manifestos - written by party leaders from Labour, Conservatives, Lib Dems, SNP, and UKIP/Brexit Party - from 1983 to present.
  • Local election leaflets: based on the Election Leaflets archive, using optical character recognition to convert images of local election leaflets to plain text, then annotating with scraped data about constituencies, parties, and candidates. [Under development]
  • Party press releases: press releases scraped from party websites for Labour and the Conservatives. [Under development]