No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CoNLLOutputter.java
ConLLSetup.java
README.md
codebook.csv
early-hansard-parser.py
emotion-final-q.csv
emotion-final-y.csv
emotion-main-models.R
lexicon-generator.R
lexicon-join.py
lexicon-polarity.csv
looper.c
looper.so
millbank-scraper.py
modern-hansard-parser.py
movie-classifier.py
remove-decorum-words.sh
valence-shifter.R

README.md

Measuring Emotion in Parliamentary Debates with Automated Textual Analysis

Supporting Scripts and Lexicon

This page contains scripts, data files as well as the final lexicon used for a study of emotional polarity in the British House of Commons. The file lexicon-polarity.csv is an emotional polarity lexicon trained on the entire British Hansard for the period 1909-2013. It can be used as an off-the-shelf lexicon for studying sentiment in political texts. It has the benefit of being adapted to the domain under study and is robust to the evolution of language during the past century.

Details regarding the methodology appear in the text. The raw corpus of parliamentary debates can be accessed here: http://search.politicalmashup.nl/

If using these materials, please cite the study as follows (click this link to access the full study):

Rheault, Ludovic, Kaspar Beelen, Christopher Cochrane and Graeme Hirst. 2016. "Measuring Emotion in Parliamentary Debates with Automated Textual Analysis". PLoS ONE 11(12): e0168843.

@Article{RHE16,
  Title = {{M}easuring {E}motion in {P}arliamentary {D}ebates with {A}utomated {T}extual {A}nalysis},
  Author = {Ludovic Rheault and Kaspar Beelen and Christopher Cochrane and Graeme Hirst},
  Journal = {PLoS ONE},
  Year = {2016},
  Volume = {11},
  Number = {12},
  Pages = {e0168843},
}

The following list describes the purpose of each script and data file.

Scripts

early-hansard-parser.py - Python 2.7 - A script to parse XML files of the early Hansard volumes from the UK Parliament.

millbank-scraper.py - Python 2.7 - A script to scrape the Millbank Systems website and retrieve Hansard volumes missing from the UK Parliament archives.

modern-hansard-parser.py - Python 2.7 - A script to parse XML files of the modern Hansard (post 1936), in the Political Mashup format.

CoNLLSetup.java - Java 8 - A custom class to use the Stanford CoreNLP library (requires CoNLLOutputter.java).

remove-decorum-words.sh - Bash 4.3 - A Perl-based Shell script to remove expressions required by the decorum of the House (e.g. "The Right Honourable").

valence-shifter.R, looper.so - C, R 3.2 - An R wrapper to add a valence-shifting variable to the CoNLL corpus, using C for speed.

lexicon-generator.R - R 3.2 - An R script to generate domain-specific lexicons based on the word vectors obtained using the Glove program.

lexicon-join.py - Python 2.7 - A script to perform fast SQL-type join operations on the corpus and compute polarity scores by quarter and year.

movie-classifier.py - Python 2.7 - A script to assess the accuracy of machine learning models based on the movie reviews dataset.

emotion-main-models.R - R 3.2 - An R script to compute graphs and empirical models.

Datasets

emotion-final-y.csv - Final dataset (yearly, normalized variables).

emotion-final-q.csv - Final dataset (quarterly, normalized variables).

codebook.csv - Description of variables in yearly and quarterly datasets.

lexicon-polarity.csv - The domain-specific polarity lexicon (4200 words).