An R package to assess the effects of text preprocessing decisions.
preText

An R package to assess the consequences of text preprocessing decisions.

[getting started with preText vignette].

The paper detailing the procedure can be found at the link below:

  • Matthew J. Denny, and Arthur Spirling (2017). "Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It". []


The easiest way to do this is to install the package from CRAN via the standard install.packages command:


If you want to get the latest version from GitHub, start by checking out the Requirements for using C++ code with R section in the following tutorial: Using C++ and R code Together with Rcpp. You will likely need to install either Xcode or Rtools depending on whether you are using a Mac or Windows machine before you can install the preText package via GitHub, since it makes use of C++ code.


Now we can install from Github using the following line:


Once the GERGM package is installed, you may access its functionality as you would any other package by calling:


If all went well, you should be able to replicate the steps in the vignette("getting_started").

Basic Usage

The basic functionality of this package is detailed in a vignette, which is [available here]. Beyond this basic functionality the package includes a number of additional utility and analysis functions for exploring and comparing multiple document--term matrices.

Bug Reporting