Skip to content

HubSKY/latent-dirichlet-allocation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Latent Dirichlet Allocation

Introduction

Latent Dirichlet Allocation (LDA) is a probabilistic generative model of text documents. Documents are modeled as a mixture over a set of "topics." Using Variational Bayesian (VB) algorithms, it is possible to learn the set of topics corresponding to the documents in a corpus. These topic features can then be used for tasks such as text categorization.

Included Files

batchLDA.m - Implements LDA in MATLAB with batch processing of documents. Takes in a set of word count vectors for the documents in the corpus and outputs the set of topic features.

classify.m - A simple text categorization example using the LDA topic features. Requires the Pattern Recognition Toolbox.

License

This code is made available under the MIT License. Please consult the included LICENSE file for complete information.

References

[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.

[2] D. M. Blei, M. D. Hoffman, and F. Bach, "Online Learning for Latent Dirichlet Allocation," in Neural Information Processing Systems (NIPS) 2010, Vancouver, 2010.

About

MATLAB implementation of LDA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • MATLAB 100.0%