Skip to content

Commit

Permalink
Updated README and vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
sfeuerriegel committed Jun 5, 2017
1 parent 3220b91 commit cd5e4a8
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 94 deletions.
36 changes: 9 additions & 27 deletions README.Rmd
Expand Up @@ -17,7 +17,7 @@ knitr::opts_chunk$set(
[![Build Status](https://travis-ci.org/sfeuerriegel/SentimentAnalysis.svg?branch=master)](https://travis-ci.org/sfeuerriegel/SentimentAnalysis)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/SentimentAnalysis)](https://cran.r-project.org/package=SentimentAnalysis)

**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.
**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.

## Overview

Expand All @@ -34,31 +34,13 @@ The most important functions in **SentimentAnalysis** are:
To see examples of these functions in use, check out the help pages, the demos and the vignette.


## Installation

Using the **devtools** package, you can easily install the latest development version of **SentimentAnalysis** with

```{r,eval=FALSE}
install.packages("devtools")
# Option 1: download and install latest version from ‘GitHub’
devtools::install_github("sfeuerriegel/SentimentAnalysis")
# Option 2: install directly from bundled archive
# devtoos::install_local("SentimentAnalysis_1.1.0.tar.gz")
```

Notes:

* In the case of option 2, you have to specify the path either to the directory of **SentimentAnalysis** or to the bundled archive **SentimentAnalysis_1.0.0.tar.gz**

* A CRAN version has not yet been released.

## Usage

This section shows the basic functionality of how to perform a sentiment analysis. First, load the corresponding package **SentimentAnalysis**.
This section shows the basic functionality of how to perform a sentiment analysis. First, install the package from CRAN. Then load the corresponding package **SentimentAnalysis**.

```{r, message=FALSE}
# install.packages("SentimentAnalysis")
library(SentimentAnalysis)
```

Expand Down Expand Up @@ -89,17 +71,17 @@ documents <- c("Wow, I really like the new light sabers!",
# Analyze sentiment
sentiment <- analyzeSentiment(documents)
# Extract dictionary-based sentiment according to the Harvard-IV dictionary
sentiment$SentimentGI
# Extract dictionary-based sentiment according to the QDAP dictionary
sentiment$SentimentQDAP
# View sentiment direction (i.e. positive, neutral and negative)
convertToDirection(sentiment$SentimentGI)
convertToDirection(sentiment$SentimentQDAP)
response <- c(+1, +1, +1, -1, 0, -1)
compareToResponse(sentiment, response)
# Optional visualization: plotSentimentResponse(sentiment$SentimentGI, response)
# Optional visualization: plotSentimentResponse(sentiment$SentimentQDAP, response)
```

## Dictionary generation
Expand All @@ -114,4 +96,4 @@ The approach utilizes LASSO regularization to extract words from documents that

**SentimentAnalysis** is released under the [MIT License](https://opensource.org/licenses/MIT)

Copyright (c) 2016 Stefan Feuerriegel & Nicolas Pröllochs
Copyright (c) 2017 Stefan Feuerriegel & Nicolas Pröllochs
39 changes: 10 additions & 29 deletions README.md
Expand Up @@ -5,7 +5,7 @@ Sentiment Analysis

[![Build Status](https://travis-ci.org/sfeuerriegel/SentimentAnalysis.svg?branch=master)](https://travis-ci.org/sfeuerriegel/SentimentAnalysis) [![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/SentimentAnalysis)](https://cran.r-project.org/package=SentimentAnalysis)

**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.
**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.

Overview
--------
Expand All @@ -22,33 +22,14 @@ The most important functions in **SentimentAnalysis** are:

To see examples of these functions in use, check out the help pages, the demos and the vignette.

Installation
------------

Using the **devtools** package, you can easily install the latest development version of **SentimentAnalysis** with

``` r
install.packages("devtools")

# Option 1: download and install latest version from ‘GitHub’
devtools::install_github("sfeuerriegel/SentimentAnalysis")

# Option 2: install directly from bundled archive
# devtoos::install_local("SentimentAnalysis_1.1.0.tar.gz")
```

Notes:

- In the case of option 2, you have to specify the path either to the directory of **SentimentAnalysis** or to the bundled archive **SentimentAnalysis\_1.0.0.tar.gz**

- A CRAN version has not yet been released.

Usage
-----

This section shows the basic functionality of how to perform a sentiment analysis. First, load the corresponding package **SentimentAnalysis**.
This section shows the basic functionality of how to perform a sentiment analysis. First, install the package from CRAN. Then load the corresponding package **SentimentAnalysis**.

``` r
# install.packages("SentimentAnalysis")

library(SentimentAnalysis)
```

Expand Down Expand Up @@ -81,12 +62,12 @@ documents <- c("Wow, I really like the new light sabers!",
# Analyze sentiment
sentiment <- analyzeSentiment(documents)

# Extract dictionary-based sentiment according to the Harvard-IV dictionary
sentiment$SentimentGI
#> [1] 0.3333333 0.5000000 0.5000000 -0.6666667 0.0000000 -0.6000000
# Extract dictionary-based sentiment according to the QDAP dictionary
sentiment$SentimentQDAP
#> [1] 0.3333333 0.5000000 0.5000000 -0.3333333 0.0000000 -0.4000000

# View sentiment direction (i.e. positive, neutral and negative)
convertToDirection(sentiment$SentimentGI)
convertToDirection(sentiment$SentimentQDAP)
#> [1] positive positive positive negative neutral negative
#> Levels: negative neutral positive

Expand Down Expand Up @@ -179,7 +160,7 @@ compareToResponse(sentiment, response)
#> avg.sentiment.pos.response 0.08333333 0.4166667
#> avg.sentiment.neg.response 0.36666667 0.0000000

# Optional visualization: plotSentimentResponse(sentiment$SentimentGI, response)
# Optional visualization: plotSentimentResponse(sentiment$SentimentQDAP, response)
```

Dictionary generation
Expand All @@ -196,4 +177,4 @@ License

**SentimentAnalysis** is released under the [MIT License](https://opensource.org/licenses/MIT)

Copyright (c) 2016 Stefan Feuerriegel & Nicolas Pröllochs
Copyright (c) 2017 Stefan Feuerriegel & Nicolas Pröllochs
58 changes: 20 additions & 38 deletions vignettes/SentimentAnalysis.Rmd
Expand Up @@ -14,7 +14,7 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

The `SentimentAnalysis` package introduces a powerful toolchain facilitating the sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV and Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter function uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable. Finally, all methods can be easility compared using built-in evaluation routines.
The `SentimentAnalysis` package introduces a powerful toolchain facilitating the sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP, Harvard IV and Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter function uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable. Finally, all methods can be easility compared using built-in evaluation routines.

# Introduction

Expand Down Expand Up @@ -48,27 +48,10 @@ In the process of performing sentiment analysis, one must convert the running te

Even though sentiment analysis has received great traction lately, the available tools are not yet living up to the needs of researchers. The `SentimentAnalysis` package is intended to partially close this gap and offer capabilities that most research demands.

## Installation

Using the `devtools` package, you can easily install the latest development version of `SentimentAnalysis` with

```{r,eval=FALSE}
library(devtools)
# Option 1: download and install latest version from GitHub
install_github("sfeuerriegel/SentimentAnalysis")
# Option 2: install directly from bundled archive
install_local("SentimentAnalysis_1.1.0.tar.gz")
```

Note: You have to specify the path either to the directory of `SentimentAnalysis` or to the bundled archive `SentimentAnalysis_1.1.0.tar.gz`.

## Package loading

Afterwards, one merely needs to load the `SentimentAnalysis` package as follows. This section shows the basic functionality to crawl for ad hoc filings. The following lines extract the ad hoc disclosure that was published most recently.
First, simply install the package `SentimentAnalysis` from CRAN. Afterwards, one merely needs to load the `SentimentAnalysis` package as follows. This section shows the basic functionality to crawl for ad hoc filings. The following lines extract the ad hoc disclosure that was published most recently.

```{r}
# install.packages("SentimentAnalysis")
library(SentimentAnalysis)
```

Expand All @@ -77,7 +60,7 @@ library(SentimentAnalysis)
```{r}
# Analyze a single string to obtain a binary response (positive / negative)
sentiment <- analyzeSentiment("Yeah, this was a great soccer game for the German team!")
convertToBinaryResponse(sentiment)$SentimentGI
convertToBinaryResponse(sentiment)$SentimentQDAP
```

```{r}
Expand All @@ -92,19 +75,19 @@ documents <- c("Wow, I really like the new light sabers!",
# Analyze sentiment
sentiment <- analyzeSentiment(documents)
# Extract dictionary-based sentiment according to the Harvard-IV dictionary
sentiment$SentimentGI
# Extract dictionary-based sentiment according to the QDAP dictionary
sentiment$SentimentQDAP
# View sentiment direction (i.e. positive, neutral and negative)
convertToDirection(sentiment$SentimentGI)
convertToDirection(sentiment$SentimentQDAP)
response <- c(+1, +1, +1, -1, 0, -1)
compareToResponse(sentiment, response)
compareToResponse(sentiment, convertToBinaryResponse(response))
plotSentimentResponse(sentiment$SentimentGI, response)
plotSentimentResponse(sentiment$SentimentQDAP, response)
```

The `SentimentAnalysis` package works very cleverly and neatly here in order to remove the effort
Expand Down Expand Up @@ -137,22 +120,22 @@ We provide examples in the following.
documents <- c("This is good",
"This is bad",
"This is inbetween")
convertToDirection(analyzeSentiment(documents)$SentimentGI)
convertToDirection(analyzeSentiment(documents)$SentimentQDAP)
```

### Document-term matrix

```{r}
library(tm)
corpus <- VCorpus(VectorSource(documents))
convertToDirection(analyzeSentiment(corpus)$SentimentGI)
convertToDirection(analyzeSentiment(corpus)$SentimentQDAP)
```

### Corpus object

```{r}
dtm <- preprocessCorpus(corpus)
convertToDirection(analyzeSentiment(dtm)$SentimentGI)
convertToDirection(analyzeSentiment(dtm)$SentimentQDAP)
```

Since the package can work directly with a document-term matrix, this allows one to use customized preprocessing operations in the first place. Afterwards, one can utilize the `SentimentAnalysis` package for the computation of sentiment scores. For instance, one can replace the stopwords with those from a different list, or even perform tailored synonym merging, among other options. By default, the package uses the built-in routines `transformIntoCorpus()` to convert the input into a `Corpus` object and `preprocessCorpus()` to convert it into a `DocumentTermMatrix`.
Expand All @@ -161,26 +144,25 @@ Since the package can work directly with a document-term matrix, this allows one

The `SentimentAnalysis` package entails three different dictionaries:

* Harvard-IV dictionary as used in the General Inquirer program
* Harvard-IV dictionary

* Henry's Financial dictionary [@Henry.2008]

* Loughran-McDonald Financial dictionary [@Loughran.2011]

* QDAP dictionary from the package [`qdapDictionaries`](https://cran.r-project.org/package=qdapDictionaries)

All of them can be manually inspected and even accessed as follows:

```{r}
# Make dictionary available in the current R environment
data(DictionaryGI)
data(DictionarHE)
# Display the internal structure
str(DictionaryGI)
str(DictionaryHE)
# Access dictionary as an object of type SentimentDictionary
dict.GI <- loadDictionaryGI()
dict.HE <- loadDictionaryHE()
# Print summary statistics of dictionary
summary(dict.GI)
data(DictionaryHE)
str(DictionaryHE)
summary(dict.HE)
data(DictionaryLM)
str(DictionaryLM)
Expand Down Expand Up @@ -272,7 +254,7 @@ Ultimately, several routines allow one to exlore the generated dictionary furthe

```{r}
compareDictionaries(dict,
loadDictionaryGI())
loadDictionaryQDAP())
sentiment <- predict(dict, documents)
compareToResponse(sentiment, response)
Expand Down Expand Up @@ -391,7 +373,7 @@ summary(sentiment$SentimentLM)
hist(scale(sentiment$SentimentLM))
# Compute cross-correlation
cor(sentiment[, c("SentimentLM", "SentimentHE", "SentimentGI")])
cor(sentiment[, c("SentimentLM", "SentimentHE", "SentimentQDAP")])
# crude oil news between 1987-02-26 until 1987-03-02
datetime <- do.call(c, lapply(crude, function(x) x$meta$datetimestamp))
Expand Down

0 comments on commit cd5e4a8

Please sign in to comment.