Implementing the PLSA Algorithm

Writing the PLSA Algorithm:

The main data structures involved in the implementation of this EM algorithm are three matrices:

T (topics by words): this is the set of parameters characterizing topic content that we denoted by θ_i's. Each element is the probability of a particular word in a particular topic.
D (documents by topics): this is the set of parameters modeling the coverage of topics in each document, which we denoted by p_ij's. Each element is the probability of a particular topic is covered in a particular document.
Z (hidden variables): For every document, we need one Z which represents the probability that each word in the document has been generated from a particular topic, so for any document, this is a "word-by-topic" matrix, encoding p(Z|w) for a particular document. Z is the matrix that we compute in the E-step (based on matrices T and D, which represent our parameters). Note that we need to compute a different Z for each document, so we need to allocate a matrix Z for every document. If we do so, the M-step is simply to use all these Z matrices together with word counts in each document to re-estimate all the parameters, i.e., updating matrices T and D based on Z. Thus at a high level, this is what's happening in the algorithm:
- T and D are initialized.
- E-step computes all Z's based on T and D.
- M-step uses all Z's to update T and D.
- We iterate until the likelihood doesn't change much when we would use T and D as our output. Note that Zs are also very useful (can you imagine some applications of Zs?).

Resources:

[1] Cheng’s note on the EM algorithm
[2] Chase Geigle’s note on the EM algorithm, which includes a derivation of the EM algorithm (see section 4), and
[3] Qiaozhu Mei’s note on the EM algorithm for PLSA, which includes a different derivation of the EM algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
data		data
.gitignore		.gitignore
README.md		README.md
livedatalab_config.json		livedatalab_config.json
plsa.py		plsa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Implementing the PLSA Algorithm

Writing the PLSA Algorithm:

Resources:

About

Uh oh!

Releases

Packages

Languages

hle027/PLSA-Algorithm

Folders and files

Latest commit

History

Repository files navigation

Implementing the PLSA Algorithm

Writing the PLSA Algorithm:

Resources:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages