stm model dashboards to a pdf file for inspectionstmprinter: Print multiple
Estimate multiple stm models and print a dashboard for each run in separate pdf pages for inspection. These function are designed for working with 15 or less number of topics (such as with survey data) and can be particularly useful when it is difficult to find a qualitiative good model on the first run.
The package includes two main functions:
||Prints all runs produced by either
You can install
stmprinter from github with:
# install.packages("devtools") devtools::install_github("mikajoh/stmprinter")
Here is an example with the
gadarian data that is included with the
First let’s prep the data as usual with
library(stm) #> stm v1.3.3 (2018-1-26) successfully loaded. See ?stm for help. #> Papers, resources, and other materials at structuraltopicmodel.com library(stmprinter) processed <- textProcessor( documents = gadarian$open.ended.response, metadata = gadarian ) #> Building corpus... #> Converting to Lower Case... #> Removing punctuation... #> Removing stopwords... #> Removing numbers... #> Stemming... #> Creating Output... out <- prepDocuments( documents = processed$documents, vocab = processed$vocab, meta = processed$meta ) #> Removing 640 of 1102 terms (640 of 3789 tokens) due to frequency #> Your corpus now has 341 documents, 462 terms and 3149 tokens.
We can then run the
many_models() function included in this package
for several K topics. It runs
stm::selectModel() for several K topics
(in parallel) and returns a list with the output. This is convenient if
you wish to estimate several models, but unlike with
(which only keeps one model per K number of topics), you wish to keep
several runs per K number of topic. Note though that the
print_models() function is also compatiable with output from
many_model() takes the same arguments as
stm::selectModel() with the
K should be vector representing
all the desired number of topics to run for. The
lets you choose how many cores to use (defaults to the amount of cores
available on the machine).
gadarian example, we could run the following to estimate stm
models for 3 to 13 number of topics.
set.seed(2018) stm_models <- many_models( K = 3:12, documents = out$documents, vocab= out$vocab, prevalence = ~ treatment + s(pid_rep), data = out$meta, N = 4, runs = 100 )
You can then print all N runs for each of the provided K topics using
print_models() with following code.
stm_models must either be the output from
stm::manyTopics(). The second argument is the texts to use for
printing the most represantative text (see
can also provide the file name (
file) and title at the top of the
first page (
print_models( stm_models, gadarian$open.ended.response, file = "gadarian_stm_runs.pdf", title = "gadarian project" )
An example of the output is shown below
Note that the
text argument is the full text responses, but
corresponding to the documents in
?stm::findThoughts). If documents is removed during
stm::prepDocuments, you will need to remove
the same texts from the original. You can typically do that with the
text <- gadarian$open.ended.response[-c(as.integer(processed$docs.removed))][-c(as.integer(out$docs.removed))]
Pull requests, questions, suggestions, etc., are welcome!