Skip to content

nproellochs/SentimentDictionaries

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

SentimentDictionaries

This library provides domain-specific dictionaries for sentiment analysis. Each dictionary consists of words that statistically feature a positive or negative polarity in movie reviews or financial filings. The dictionaries are extracted from two different corpora, namely, IMDb movie reviews and U.S. regulated Form 8-K filings. Details are available from the following reference.

  • Pröllochs, Feuerriegel and Neumann (2018): Statistical Inferences for Polarity Identification in Natural Language, Working Paper, Chair for Information Systems Research, University of Freiburg, Germany.

Details

This library contains the following dictionary resources in CSV format.

  • Movie reviews dictionary : This dictionary contains words that feature a positive or negative connotation in IMDb movie reviews (DictionaryIMDB.csv).
  • Financial filings dictionary: This dictionary contains words that feature a positive or negative connotation in U.S. regulated 8-K filings (Dictionary8K.csv).

The individual columns of each dictionary are as follows:

  • Words: This column lists the individual dictionary entries. We provide stems instead of complete words as stemming is part of the document preprocessing.
  • Scores: This column denotes the polarity score of each entry.
  • Idf: This column denotes the inverse document frequency (idf) of each entry.

In addition, this library contains the following datasets that were used to generate the above dictionaries.

  • Movie reviews dataset: This dataset contains reviews and ratings for 5006 IMDb movie reviews (Dataset_IMDB.csv).
  • Financial filings dataset: This dataset contains daily stock market returns and filings paths for 76716 U.S. regulated 8-K filings (Dataset_8K.csv).

Usage in R

We also provide both dictionaries in the form of a package for the statistical software R. You can install SentimentDictionaries from github with:

# install.packages("devtools")
devtools::install_github("nproellochs/SentimentDictionaries", subdir = "R-package")

Both dictionaries can be easily used in combination with the SentimentAnalysis R package.

License

SentimentDictionaries is released under the MIT License

Copyright (c) 2018 Nicolas Pröllochs & Stefan Feuerriegel

Releases

No releases published

Packages

No packages published

Languages