Updated README and vignette

sfeuerriegel · Jun 5, 2017 · cd5e4a8 · cd5e4a8
1 parent 3220b91
commit cd5e4a8
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 94 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -17,7 +17,7 @@ knitr::opts_chunk$set(
 [![Build Status](https://travis-ci.org/sfeuerriegel/SentimentAnalysis.svg?branch=master)](https://travis-ci.org/sfeuerriegel/SentimentAnalysis)
 [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/SentimentAnalysis)](https://cran.r-project.org/package=SentimentAnalysis)
 
-**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms  based on an exogenous response variable. 
+**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms  based on an exogenous response variable. 
 
 ## Overview
 
@@ -34,31 +34,13 @@ The most important functions in **SentimentAnalysis** are:
 To see examples of these functions in use, check out the help pages, the demos and the vignette. 
 
 
-## Installation
-
-Using the **devtools** package, you can easily install the latest development version of **SentimentAnalysis** with
-
-```{r,eval=FALSE}
-install.packages("devtools")
-
-# Option 1: download and install latest version from ‘GitHub’
-devtools::install_github("sfeuerriegel/SentimentAnalysis")
-
-# Option 2: install directly from bundled archive
-# devtoos::install_local("SentimentAnalysis_1.1.0.tar.gz")
-```
-
-Notes: 
-
-* In the case of option 2, you have to specify the path either to the directory of **SentimentAnalysis** or to the bundled archive **SentimentAnalysis_1.0.0.tar.gz**
-
-* A CRAN version has not yet been released.
-
 ## Usage
 
-This section shows the basic functionality of how to perform a sentiment analysis. First, load the corresponding package **SentimentAnalysis**. 
+This section shows the basic functionality of how to perform a sentiment analysis. First, install the package from CRAN. Then load the corresponding package **SentimentAnalysis**. 
 
 ```{r, message=FALSE}
+# install.packages("SentimentAnalysis")
+
 library(SentimentAnalysis)
 ```
 
@@ -89,17 +71,17 @@ documents <- c("Wow, I really like the new light sabers!",
 # Analyze sentiment
 sentiment <- analyzeSentiment(documents)
 
-# Extract dictionary-based sentiment according to the Harvard-IV dictionary
-sentiment$SentimentGI
+# Extract dictionary-based sentiment according to the QDAP dictionary
+sentiment$SentimentQDAP
 
 # View sentiment direction (i.e. positive, neutral and negative)
-convertToDirection(sentiment$SentimentGI)
+convertToDirection(sentiment$SentimentQDAP)
 
 response <- c(+1, +1, +1, -1, 0, -1)
 
 compareToResponse(sentiment, response)
 
-# Optional visualization: plotSentimentResponse(sentiment$SentimentGI, response)
+# Optional visualization: plotSentimentResponse(sentiment$SentimentQDAP, response)
 ```
 
 ## Dictionary generation
@@ -114,4 +96,4 @@ The approach utilizes LASSO regularization to extract words from documents that
 
 **SentimentAnalysis** is released under the [MIT License](https://opensource.org/licenses/MIT)
 
-Copyright (c) 2016 Stefan Feuerriegel & Nicolas Pröllochs
+Copyright (c) 2017 Stefan Feuerriegel & Nicolas Pröllochs
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ Sentiment Analysis
 
 [![Build Status](https://travis-ci.org/sfeuerriegel/SentimentAnalysis.svg?branch=master)](https://travis-ci.org/sfeuerriegel/SentimentAnalysis) [![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/SentimentAnalysis)](https://cran.r-project.org/package=SentimentAnalysis)
 
-**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.
+**SentimentAnalysis** performs a **sentiment analysis** of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.
 
 Overview
 --------
@@ -22,33 +22,14 @@ The most important functions in **SentimentAnalysis** are:
 
 To see examples of these functions in use, check out the help pages, the demos and the vignette.
 
-Installation
-------------
-
-Using the **devtools** package, you can easily install the latest development version of **SentimentAnalysis** with
-
-``` r
-install.packages("devtools")
-
-# Option 1: download and install latest version from ‘GitHub’
-devtools::install_github("sfeuerriegel/SentimentAnalysis")
-
-# Option 2: install directly from bundled archive
-# devtoos::install_local("SentimentAnalysis_1.1.0.tar.gz")
-```
-
-Notes:
-
--   In the case of option 2, you have to specify the path either to the directory of **SentimentAnalysis** or to the bundled archive **SentimentAnalysis\_1.0.0.tar.gz**
-
--   A CRAN version has not yet been released.
-
 Usage
 -----
 
-This section shows the basic functionality of how to perform a sentiment analysis. First, load the corresponding package **SentimentAnalysis**.
+This section shows the basic functionality of how to perform a sentiment analysis. First, install the package from CRAN. Then load the corresponding package **SentimentAnalysis**.
 
 ``` r
+# install.packages("SentimentAnalysis")
+
 library(SentimentAnalysis)
 ```
 
@@ -81,12 +62,12 @@ documents <- c("Wow, I really like the new light sabers!",
 # Analyze sentiment
 sentiment <- analyzeSentiment(documents)
 
-# Extract dictionary-based sentiment according to the Harvard-IV dictionary
-sentiment$SentimentGI
-#> [1]  0.3333333  0.5000000  0.5000000 -0.6666667  0.0000000 -0.6000000
+# Extract dictionary-based sentiment according to the QDAP dictionary
+sentiment$SentimentQDAP
+#> [1]  0.3333333  0.5000000  0.5000000 -0.3333333  0.0000000 -0.4000000
 
 # View sentiment direction (i.e. positive, neutral and negative)
-convertToDirection(sentiment$SentimentGI)
+convertToDirection(sentiment$SentimentQDAP)
 #> [1] positive positive positive negative neutral  negative
 #> Levels: negative neutral positive
 
@@ -179,7 +160,7 @@ compareToResponse(sentiment, response)
 #> avg.sentiment.pos.response     0.08333333      0.4166667
 #> avg.sentiment.neg.response     0.36666667      0.0000000
 
-# Optional visualization: plotSentimentResponse(sentiment$SentimentGI, response)
+# Optional visualization: plotSentimentResponse(sentiment$SentimentQDAP, response)
 ```
 
 Dictionary generation
@@ -196,4 +177,4 @@ License
 
 **SentimentAnalysis** is released under the [MIT License](https://opensource.org/licenses/MIT)
 
-Copyright (c) 2016 Stefan Feuerriegel & Nicolas Pröllochs
+Copyright (c) 2017 Stefan Feuerriegel & Nicolas Pröllochs
diff --git a/vignettes/SentimentAnalysis.Rmd b/vignettes/SentimentAnalysis.Rmd
@@ -14,7 +14,7 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
-The `SentimentAnalysis` package introduces a powerful toolchain facilitating the sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV and Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter function uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable. Finally, all methods can be easility compared using built-in evaluation routines.
+The `SentimentAnalysis` package introduces a powerful toolchain facilitating the sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as QDAP, Harvard IV and Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter function uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable. Finally, all methods can be easility compared using built-in evaluation routines.
 
 # Introduction
 
@@ -48,27 +48,10 @@ In the process of performing sentiment analysis, one must convert the running te
 
 Even though sentiment analysis has received great traction lately, the available tools are not yet living up to the needs of researchers. The `SentimentAnalysis` package is intended to partially close this gap and offer capabilities that most research demands.
 
-## Installation
-
-Using the `devtools` package, you can easily install the latest development version of `SentimentAnalysis` with
-
-```{r,eval=FALSE}
-library(devtools)
-
-# Option 1: download and install latest version from GitHub
-install_github("sfeuerriegel/SentimentAnalysis")
-
-# Option 2: install directly from bundled archive
-install_local("SentimentAnalysis_1.1.0.tar.gz")
-```
-
-Note: You have to specify the path either to the directory of `SentimentAnalysis` or to the bundled archive `SentimentAnalysis_1.1.0.tar.gz`.
-
-## Package loading
-
-Afterwards, one merely needs to load the `SentimentAnalysis` package as follows. This section shows the basic functionality to crawl for ad hoc filings. The following lines extract the ad hoc disclosure that was published most recently. 
+First, simply install the package `SentimentAnalysis` from CRAN. Afterwards, one merely needs to load the `SentimentAnalysis` package as follows. This section shows the basic functionality to crawl for ad hoc filings. The following lines extract the ad hoc disclosure that was published most recently. 
 
 ```{r}
+# install.packages("SentimentAnalysis")
 library(SentimentAnalysis)
 ```
 
@@ -77,7 +60,7 @@ library(SentimentAnalysis)
 ```{r}
 # Analyze a single string to obtain a binary response (positive / negative)
 sentiment <- analyzeSentiment("Yeah, this was a great soccer game for the German team!")
-convertToBinaryResponse(sentiment)$SentimentGI
+convertToBinaryResponse(sentiment)$SentimentQDAP
 ```
 
 ```{r}
@@ -92,19 +75,19 @@ documents <- c("Wow, I really like the new light sabers!",
 # Analyze sentiment
 sentiment <- analyzeSentiment(documents)
 
-# Extract dictionary-based sentiment according to the Harvard-IV dictionary
-sentiment$SentimentGI
+# Extract dictionary-based sentiment according to the QDAP dictionary
+sentiment$SentimentQDAP
 
 # View sentiment direction (i.e. positive, neutral and negative)
-convertToDirection(sentiment$SentimentGI)
+convertToDirection(sentiment$SentimentQDAP)
 
 response <- c(+1, +1, +1, -1, 0, -1)
 
 compareToResponse(sentiment, response)
 
 compareToResponse(sentiment, convertToBinaryResponse(response))
 
-plotSentimentResponse(sentiment$SentimentGI, response)
+plotSentimentResponse(sentiment$SentimentQDAP, response)
 ```
 
 The `SentimentAnalysis` package works very cleverly and neatly here in order to remove the effort
@@ -137,22 +120,22 @@ We provide examples in the following.
 documents <- c("This is good",
                "This is bad",
                "This is inbetween")
-convertToDirection(analyzeSentiment(documents)$SentimentGI)
+convertToDirection(analyzeSentiment(documents)$SentimentQDAP)
 ```
 
 ### Document-term matrix
 
 ```{r}
 library(tm)
 corpus <- VCorpus(VectorSource(documents))
-convertToDirection(analyzeSentiment(corpus)$SentimentGI)
+convertToDirection(analyzeSentiment(corpus)$SentimentQDAP)
 ```
 
 ### Corpus object
 
 ```{r}
 dtm <- preprocessCorpus(corpus)
-convertToDirection(analyzeSentiment(dtm)$SentimentGI)
+convertToDirection(analyzeSentiment(dtm)$SentimentQDAP)
 ```
 
 Since the package can work directly with a document-term matrix, this allows one to use customized preprocessing operations in the first place. Afterwards, one can utilize the `SentimentAnalysis` package for the computation of sentiment scores. For instance, one can replace the stopwords with those from a different list, or even perform tailored synonym merging, among other options. By default, the package uses the built-in routines `transformIntoCorpus()` to convert the input into a `Corpus` object and `preprocessCorpus()` to convert it into a `DocumentTermMatrix`.
@@ -161,26 +144,25 @@ Since the package can work directly with a document-term matrix, this allows one
 
 The `SentimentAnalysis` package entails three different dictionaries:
 
-* Harvard-IV dictionary as used in the General Inquirer program
+* Harvard-IV dictionary
 
 * Henry's Financial dictionary [@Henry.2008]
 
 * Loughran-McDonald Financial dictionary [@Loughran.2011]
 
+* QDAP dictionary from the package [`qdapDictionaries`](https://cran.r-project.org/package=qdapDictionaries)
+
 All of them can be manually inspected and even accessed as follows:
 
 ```{r}
 # Make dictionary available in the current R environment
-data(DictionaryGI)
+data(DictionarHE)
 # Display the internal structure 
-str(DictionaryGI)
+str(DictionaryHE)
 # Access dictionary as an object of type SentimentDictionary
-dict.GI <- loadDictionaryGI()
+dict.HE <- loadDictionaryHE()
 # Print summary statistics of dictionary
-summary(dict.GI)
-
-data(DictionaryHE)
-str(DictionaryHE)
+summary(dict.HE)
 
 data(DictionaryLM)
 str(DictionaryLM)
@@ -272,7 +254,7 @@ Ultimately, several routines allow one to exlore the generated dictionary furthe
 
 ```{r}
 compareDictionaries(dict,
-                    loadDictionaryGI())
+                    loadDictionaryQDAP())
 
 sentiment <- predict(dict, documents)
 compareToResponse(sentiment, response)
@@ -391,7 +373,7 @@ summary(sentiment$SentimentLM)
 hist(scale(sentiment$SentimentLM))
 
 # Compute cross-correlation 
-cor(sentiment[, c("SentimentLM", "SentimentHE", "SentimentGI")])
+cor(sentiment[, c("SentimentLM", "SentimentHE", "SentimentQDAP")])
 
 # crude oil news between  1987-02-26 until 1987-03-02
 datetime <- do.call(c, lapply(crude, function(x) x$meta$datetimestamp))