sparse_hlda

a fast Cpp-implementation Hierarchy Latent Dirichlet Allocation algorithm, can aggregate stop-words/meaningless-high-frequency-words into "common-topic"(a rubbish words bucket), generate K(number of topics you set) more pure "special-topics".

features:

supprot load last-trained-model and continue training;
using sparse-gibbs-sampler, faster than collapsed-gibbs-sampler;
using Hierarchy LDA structure, can aggregate stop-words/meaningless-high-frequency-words into "common-topic"(a rubbish words bucket);
(developing)support mixture data structure (most freqence words saveing in continuous-memory and others saving in linked-list) to save memory;

usage:

./spare_hlda -input docs.txt -output model_out/ -num_topics 100 -num_iters 30 -save_step 10

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

sparse_hlda

features:

usage:

About

Releases

Packages

Languages

kejunxiao/sparse_hlda

Folders and files

Latest commit

History

Repository files navigation

sparse_hlda

features:

usage:

About

Topics

Resources

Stars

Watchers

Forks

Languages