Topic modeling for the programming languages literature.
You can run our tool!
The analysis
directory holds the R scripts we used to generate
figures for the
paper.
The lda
directory holds the Python and bash scripts we used to run
David Blei's LDA-C. Outputs get
put in the out
directory.
The sessions
directory is the (not quite finished) analysis of
session data for POPL.
The www
directory is the website frontend and backend.
You'll need David Blei's LDA-C, compiled and with lda
on your path. You'll also need the Python library nltk, with the stopwords
and wordnet
modules installed.
To do the R analysis, you'll need R with ggplot2
installed.