Hansard Speeches and Sentiment

Repository for a public dataset of speeches in the Hansard. The dataset provides information on each speech of ten words or longer, made in the House of Commons between 1980 and 2016, with information on the speaking MP, their party, gender and age at the time of the speech. The dataset also includes all speeches of ten words made from 1936 to 1980, for a total of 4,212,134 speeches and 773,585,770 words. More information on the dataset is available here. The dataset itself can be accessed through Zenodo.

The speeches have been classified for sentiment using a total of four libraries from the R package lexicon, one from syuzhet and one from this paper. All six scores used the method from the sentimentr package. The libraries are:

The AFINN library by Finn Årup Nielsen, labelled afinn. The AFINN library was accessed through the syuzhet package.
The Opinion Mining, Sentiment Analysis and Opinion Spam Detection dataset by Bing Liu, Minqing Hu and Junsheng Cheng, labelled bing. The Bing library was access through the syuzhet package.
The NRC Word-Emotion Association Lexicon, library by Saif M. Mohammad, labelled nrc. The NRC library was access through the syuzhet package.
The Sentiwords dataset, created by Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. The Sentiwords library was accessed through the library was accessed through the lexicon package.
The Hu & Liu dataset, by Minqing Hu and Bing Liu, labelled Hu. The Hu & Liu library was accessed through the sentimentr package.
A modified version of the unnamed lexicon from the paper Measuring Emotion in Parliamentary Debates with Automated Textual Analysis, labelled rheault. As the method in sentimentr does not use distinguish between the same word that can occupy multiple lexical categories, I used the average polarity score assigned to such words.

Notes

The data used to create this dataset was taken from the parlparse project operated by They Work For You and supported by mySociety.

The dataset is licensed under a Creative Commons Attribution 4.0 International License.

The code included in this repository is licensed under an MIT license.

Please contact me or open an issue here if you find any errors in the dataset. The integrity of the public Hansard record is questionable at times, and while I have improved it, the data is presented 'as is'.

New in 2.4.3

"Julia Dockerill" name changed to "Julia Lopez" to reflect MP's actual name change

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
R		R
.gitignore		.gitignore
LICENSE		LICENSE
MPs to check in all files.txt		MPs to check in all files.txt
README.md		README.md
common-errors.md		common-errors.md
custom-eo-id-code-notes.md		custom-eo-id-code-notes.md
hansard-data.Rproj		hansard-data.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

.gitignore

.gitignore

LICENSE

LICENSE

MPs to check in all files.txt

MPs to check in all files.txt

README.md

README.md

common-errors.md

common-errors.md

custom-eo-id-code-notes.md

custom-eo-id-code-notes.md

hansard-data.Rproj

hansard-data.Rproj

Repository files navigation

Hansard Speeches and Sentiment

Notes

New in 2.4.3

About

Releases 3

Packages

Languages

License

evanodell/hansard-data

Folders and files

Latest commit

History

Repository files navigation

Hansard Speeches and Sentiment

Notes

New in 2.4.3

About

Resources

License

Stars

Watchers

Forks

Languages