Skip to content

a fast Cpp-implementation Hierarchy Latent Dirichlet Allocation algorithm, can aggregate stop-words/meaningless-high-frequency-words into "common-topic"(a rubbish words bucket) and generate K(number of topics you set) more pure "special-topics".

Notifications You must be signed in to change notification settings

kejunxiao/sparse_hlda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 

Repository files navigation

sparse_hlda

a fast Cpp-implementation Hierarchy Latent Dirichlet Allocation algorithm, can aggregate stop-words/meaningless-high-frequency-words into "common-topic"(a rubbish words bucket), generate K(number of topics you set) more pure "special-topics".

features:

  • supprot load last-trained-model and continue training;
  • using sparse-gibbs-sampler, faster than collapsed-gibbs-sampler;
  • using Hierarchy LDA structure, can aggregate stop-words/meaningless-high-frequency-words into "common-topic"(a rubbish words bucket);
  • (developing)support mixture data structure (most freqence words saveing in continuous-memory and others saving in linked-list) to save memory;

usage:

./spare_hlda -input docs.txt -output model_out/ -num_topics 100 -num_iters 30 -save_step 10

About

a fast Cpp-implementation Hierarchy Latent Dirichlet Allocation algorithm, can aggregate stop-words/meaningless-high-frequency-words into "common-topic"(a rubbish words bucket) and generate K(number of topics you set) more pure "special-topics".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published