Skip to content

phively/uchicago-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

130 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

University of Chicago Master's Thesis.

Crowdsourced Statistics Advice: Topic Modeling stats.stackexchange.com

Topic models are hierarchical Bayesian models used to discover latent semantic structure within collections of documents, allowing them to be reduced from millions of words to a few dozen interpretable topics. This paper presents three closely related methods: latent Dirichlet allocation, correlated topic models, and structural topic models. I discuss the estimation challenges associated with topic modeling and compare the three methods by analyzing a collection of 182,308 posts contributed by the general public to the statistics and machine learning community website stats.stackexchange.com.

Hively_2017_Topic_Models.pdf

Data taken from the Dec 15, 2016 Stack Exchange Data Dump and licensed under Creative Commons Share Alike 3.0.

01-exploration-and-parsing.nb.html

02-vocabulary-experimentation.nb.html

03-datafile-construction.nb.html

04-first-lda.nb.html

05-lda-with-tex.nb.html

06-dataset-comparison.nb.html

Parallel.nb.html

07a-lda-kfold-xval.nb.html

07b-lda-kfold-parallel.nb.html

08e-ctm-sigma-holdout.nb.html

09a-stm-metadata-creation.nb.html

09b-stm-continuous.nb.html

10a-figures.nb.html

10b-model-comparisons.nb.html

About

UChicago Statistics Master's thesis data and code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors