Skip to content

kgjerde/corporaexplorer

Repository files navigation

corporaexplorer: An R package for dynamic exploration of text collections

CRAN status License: GPL v3 R build status DOI Mentioned in Awesome R

“I really like the application and its simplicity. It looks great and is very functional. … a nice addition to text analysis tools.”
Kenneth Benoit, creator of quanteda, professor of computational social science at LSE

“I really enjoyed interacting with corporaexplorer. This is exciting work that opens up doors for non-technical users.”
Tyler Rinker, creator of sentimentr and qdap

– Featured in RStudio’s “R Views” blog’s “Top 40 New R Packages”

– Included in CRAN Task View: Natural Language Processing



Illustration screenshots

What is corporaexplorer?

corporaexplorer is an R package that uses the Shiny graphical user interface framework for dynamic exploration of text collections.

corporaexplorer is designed for use with a wide range of text collections; one example could be a collection of tens of thousands of documents scraped from a governmental website; another example could be the collected works of a novelist; a third example could be the chapters of a single book.

corporaexplorer’s intended primary audience are qualitatively oriented researchers who rely on close reading of textual documents as part of their academic activity, but the package should also be a useful supplement for those doing quantitative textual research and wishing to visit the texts under study. Finally, by offering a convenient way to explore any character vector, it can also be useful for a wide range of other R users.

While collecting and preparing the text collections to be explored requires some familiarity with R programming, using the Shiny apps for exploring and extracting documents from the corpus should be fairly intuitive also for those with no programming knowledge, once the apps have been set up by a collaborator. Thus, the aim is for the package to be useful for anyone with a rudimentary knowledge of R – or with collaborators who have such knowledge.

Installation

To install the released version from CRAN, simply run the following from an R console:

install.packages("corporaexplorer")

Alternatively, to install the development version from GitHub, run the following from an R console:

install.packages("devtools")
devtools::install_github("kgjerde/corporaexplorer")

corporaexplorer works on Mac OS, Windows and Linux. (The Shiny apps look much clunkier on Windows than on the other platforms, but the apps are fully functional.)

Note to developers: The package’s internal test suite uses the shinytest package, which requires that PhantomJS is installed. This can be done through the shinytest::installDependencies() function.

How to cite

Please cite the following paper if you use corporaexplorer in your research.

Gjerde, Kristian Lundby. 2019. “corporaexplorer: An R package for dynamic exploration of text collections.” Journal of Open Source Software 4 (38): 1342. https://doi.org/10.21105/joss.01342.

For a BibTeX entry, use the output from citation(package = "corporaexplorer").

Usage

For usage instructions and example corpora, see the package web page.

Demo apps

The package includes two demo apps.

To explore Jane Austen’s novels (data accessed through the janeaustenr package):

library(corporaexplorer)
run_janeausten_app()

To explore the US presidents’ State of the Union addresses (data accessed through the the sotu package):

library(corporaexplorer)
run_sotu_app()

For more info, see https://kgjerde.github.io/corporaexplorer/articles/jane_austen.html and https://kgjerde.github.io/corporaexplorer/articles/sotu.html, and also the function references.

Contributing

Contributions in the form of feedback, bug reports and code are most welcome. Ways to contribute: