Skip to content

A repository of scripts and data required to built datasets of British political language - including contributions to Hansard, Prime Minister's Questions, leader's speeches to conferences, manifesto forewords, local election leaflets, and more.

Notifications You must be signed in to change notification settings

nrbailey/scraping-political-speeches

Repository files navigation

Scraping political speeches

This is a repository of scripts and data required to replicate various datasets of political speeches, including:

  • Hansard in full: Parliamentary speeches matched to individual MPs with party and biographic information, using the TheyWorkForYou database.
  • Hansard PMQs: maiden speeches given by new British MPs from 1945 onwards, using debate transcripts scraped from Hansard - includes speech content, date.
  • Conference speeches: leaders' speeches at party conferences, available at BritishPoliticalSpeech.org, scraped into a dataframe and cleaned - includes speech content, politicians' names, party, year, location, commentary on the speech, and tags.
  • Manifesto forewords: uses the Manifesto Project archive to extract forewords to British election manifestos - written by party leaders from Labour, Conservatives, Lib Dems, SNP, and UKIP/Brexit Party - from 1983 to present.
  • Local election leaflets: based on the Election Leaflets archive, using optical character recognition to convert images of local election leaflets to plain text, then annotating with scraped data about constituencies, parties, and candidates. [Under development]
  • Party press releases: press releases scraped from party websites for Labour and the Conservatives. [Under development]

About

A repository of scripts and data required to built datasets of British political language - including contributions to Hansard, Prime Minister's Questions, leader's speeches to conferences, manifesto forewords, local election leaflets, and more.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published