An R package to assess the consequences of text preprocessing decisions.
The paper detailing the procedure can be found at the link below:
- Matthew J. Denny, and Arthur Spirling (2017). "Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It". [ssrn.com/abstract=2849145]
The easiest way to do this is to install the package from CRAN via the standard
If you want to get the latest version from GitHub, start by checking out the
Requirements for using C++ code with R section in the following
tutorial: Using C++ and R code Together with Rcpp.
You will likely need to install either
Rtools depending on whether
you are using a Mac or Windows machine before you can install the preText package
via GitHub, since it makes use of C++ code.
Now we can install from Github using the following line:
GERGM package is installed, you may access its functionality as you
would any other package by calling:
If all went well, you should be able to replicate the steps in the
The basic functionality of this package is detailed in a vignette, which is [available here]. Beyond this basic functionality the package includes a number of additional utility and analysis functions for exploring and comparing multiple document--term matrices.
PLEASE REPORT ANY BUGS OR ERRORS TO firstname.lastname@example.org.