Skip to content

Code for running robust and repeatable unsupervised topic modeling experiments

License

Notifications You must be signed in to change notification settings

gwdonlab/latent-dirichlet-allocation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Dirichlet Allocation

Code for running robust and repeatable LDA experiments. Use the -h flag to view CLI parameters for any script.

Files and Folders

  • lda: Files related to the training and analysis of LDA topic models
  • dlda: Files related to the training and analysis of dynamic topic models (using gensim's ldaseq implementation)
  • list_common_words.py: Takes an experiment config file as a command line argument and runs all specified preprocessing before listing the top 50 words in the dataset which will be used in that experiment
    • See lda or dlda READMEs for the structure of an experiment JSON file
  • plot_data_quants.py: Driver function to use a TextParser to make plots of the quantities of data in time frames (especially useful for deciding time intervals for a dynamic topic model)

Dependencies

Install our ogm package and its dependencies.

About

Code for running robust and repeatable unsupervised topic modeling experiments

Topics

Resources

License

Stars

Watchers

Forks

Languages