Dictionaries for sentiment analysis
Switch branches/tags
Nothing to show
Clone or download
Latest commit 8507aba Apr 9, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R-package Updated readme Feb 5, 2017
.gitignore Updated readme Feb 5, 2017
Dataset_8K.csv * Added datasets Mar 30, 2018
Dataset_IMDB.csv * Added datasets Mar 30, 2018
Dictionary8K.csv Added dictionaries in csv format Feb 5, 2017
DictionaryIMDB.csv Added dictionaries in csv format Feb 5, 2017
README.md * Updated readme Apr 9, 2018

README.md

SentimentDictionaries

This library provides domain-specific dictionaries for sentiment analysis. Each dictionary consists of words that statistically feature a positive or negative polarity in movie reviews or financial filings. The dictionaries are extracted from two different corpora, namely, IMDb movie reviews and U.S. regulated Form 8-K filings. Details are available from the following reference.

  • Pröllochs, Feuerriegel and Neumann (2018): Statistical Inferences for Polarity Identification in Natural Language, Working Paper, Chair for Information Systems Research, University of Freiburg, Germany.

Details

This library contains the following dictionary resources in CSV format.

  • Movie reviews dictionary : This dictionary contains words that feature a positive or negative connotation in IMDb movie reviews (DictionaryIMDB.csv).
  • Financial filings dictionary: This dictionary contains words that feature a positive or negative connotation in U.S. regulated 8-K filings (Dictionary8K.csv).

The individual columns of each dictionary are as follows:

  • Words: This column lists the individual dictionary entries. We provide stems instead of complete words as stemming is part of the document preprocessing.
  • Scores: This column denotes the polarity score of each entry.
  • Idf: This column denotes the inverse document frequency (idf) of each entry.

In addition, this library contains the following datasets that were used to generate the above dictionaries.

  • Movie reviews dataset: This dataset contains reviews and ratings for 5006 IMDb movie reviews (Dataset_IMDB.csv).
  • Financial filings dataset: This dataset contains daily stock market returns and filings paths for 76716 U.S. regulated 8-K filings (Dataset_8K.csv).

Usage in R

We also provide both dictionaries in the form of a package for the statistical software R. You can install SentimentDictionaries from github with:

# install.packages("devtools")
devtools::install_github("nproellochs/SentimentDictionaries", subdir = "R-package")

Both dictionaries can be easily used in combination with the SentimentAnalysis R package.

License

SentimentDictionaries is released under the MIT License

Copyright (c) 2018 Nicolas Pröllochs & Stefan Feuerriegel