Skip to content
Word-Based Dictionaries for Natural Language
R Other
  1. R 98.9%
  2. Other 1.1%
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github
R
man
.Rbuildignore Edit docs Sep 12, 2019
.gitignore
DESCRIPTION
Dockerfile
LICENSE
LICENSE.md
NAMESPACE
README.Rmd
README.md
dict.Rproj

README.md

dict

The goal of dict is to make it easy to create, modify, and use word-based dictionaries for analyzing natural language texts. In other words, this package is designed to create sentiment-analysis like tools for measuring the extent to which natural language reflects (positively and/or negatively) any user-defined dimension or topic (e.g., politics, sports, emotions, active, pass, medical, philosophical, etc.).

Installation

You can install the released version of dict from CRAN with:

install.packages("dict")

You can install the developmernt version of dict from Github with:

remotes::install_github("mkearney/dict")

Example

Create vectors of positively and negatively identifying words for a simple “valence” dimension.

## load package
library(dict)

## vector of positive words
pos <- c("like", "love", "amazing", "excellent", "great", "fantastic",
  "incredible", "awesome", "best", "favorite", "fan", "fun", "enjoyed",
  "good", "solid", "better", "soooo", "happy")

## vetor of negative words
neg <- c("hate", "loathe", "dislike", "awful", "horrible", "worst",
  "miserable", "ruin", "ruining", "destroy", "destroyed", "destroying",
  "pathetic", "hated", "unhappy", "terrible")

Create dictionaries

Create a dictionary using only positively-coded words or both positively and negatively coded words.

## create dictionary using only positive words
only_pos <- dict(pos)

## create dictionary using both positive and negative words
d <- dict(list(pos = pos, neg = neg))

## view up to n entries of dictionary
print(d, n = 15)
#> # A dict[ionary]
#> # A tibble: 34 x 2
#>    word       weight
#>    <chr>       <dbl>
#>  1 like            1
#>  2 love            1
#>  3 amazing         1
#>  4 excellent       1
#>  5 great           1
#>  6 fantastic       1
#>  7 incredible      1
#>  8 awesome         1
#>  9 best            1
#> 10 favorite        1
#> 11 fan             1
#> 12 fun             1
#> 13 enjoyed         1
#> 14 good            1
#> 15 solid           1
#> # … with 19 more rows

Use word dictionary

Apply a dictionary to some example text:

## example text
txt <- c("love amazing excellent good",
  "hate loathe horrifies unhappy terrible",
  "awesome best hateful hated worst")

## get estimates for each element of txt using pos/neg dictionary
d$score(txt)
#>   positive negative score wc
#> 1        4        0     4  4
#> 2        0        4    -4  5
#> 3        2        2     0  5

## store only the overall score in a tibble
tibble::tibble(
  text  = txt,
  score = d$score_score(txt)
)
#> # A tibble: 3 x 2
#>   text                                   score
#>   <chr>                                  <dbl>
#> 1 love amazing excellent good                4
#> 2 hate loathe horrifies unhappy terrible    -4
#> 3 awesome best hateful hated worst           0

Export dictionary via R package

Export word dictionaries as super fast packages using this wrapper around usethis::create_package()

## create package path via temp directory
path_pkg <- file.path(tempdir(), "simpleexample")

## create R package featuring d
create_dict_pkg(d, path_pkg)
#> ✔ Creating '/tmp/RtmppISNUk/simpleexample/'
#> ✔ Setting active project to '/tmp/RtmppISNUk/simpleexample'
#> ✔ Creating 'R/'
#> ✔ Writing 'DESCRIPTION'
#> Package: simpleexample
#> Title: Word Dictionary Analysis Scorer
#> Version: 0.0.1
#> Authors@R (parsed):
#>     * Michael W. Kearney <kearneymw@missouri.edu> [aut, cre] (<https://orcid.org/0000-0002-0730-4694>)
#> Description: Data and functions for a natural language word dictionary
#> License: What license it uses
#> Depends:
#>     R (>= 3.0.0)
#> ByteCompile: yes
#> Encoding: UTF-8
#> LazyData: yes
#> LazyLoad: yes
#> NeedsCompilation: yes
#> ✔ Writing 'NAMESPACE'
#> ✔ Writing 'simpleexample.Rproj'
#> ✔ Adding '.Rproj.user' to '.gitignore'
#> ✔ Adding '^simpleexample\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
#> ✔ Setting active project to '<no active project>'
#> �[32m✔�[39m Save positive word list
#> �[32m✔�[39m Save negative word list

## test new package's score function on txt vector
simpleexample::score(txt)
#>   positive negative score wc
#> 1        4        0     4  4
#> 2        0        4    -4  5
#> 3        2        2     0  5

Compare the speed of the default returned function (written in R) versus the optimized version in the standalone package (written in C)

## analyzeSentiment() won't work unless you load the whole library
library(SentimentAnalysis)
#> 
#> Attaching package: 'SentimentAnalysis'
#> The following object is masked from 'package:base':
#> 
#>     write

## compare speed
bm <- bench::mark(
  SentimentAnalysis = analyzeSentiment(txt),
  syuzhet = syuzhet::get_sentiment(txt),
  dict_fun = d$score(txt),
  dict_pkg = simpleexample::score(txt),
  relative = TRUE,
  check = FALSE,
  iterations = 30
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

## view results
bm
#> # A tibble: 4 x 6
#>   expression            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <dbl>   <dbl>     <dbl>     <dbl>    <dbl>
#> 1 SentimentAnalysis 66826.  67647.         1        Inf      Inf
#> 2 syuzhet              49.9    55.8     1211.       Inf      NaN
#> 3 dict_fun             21.8    21.0     3135.       Inf      NaN
#> 4 dict_pkg              1       1      57194.       NaN      NaN

## view plot
ggplot2::autoplot(bm)
#> Loading required namespace: tidyr

TO DO

See issues labelled enhancement.

You can’t perform that action at this time.