# <font color='red'>Latent Dirichlet Allocation

## <font color='blue'>Introduction

In this notebook we are going to apply a topic modeling algorithm called *__"Latent Dirichlet Allocation"__*, which can be used to find the topics in documents. In LDA, topics are represented as a set of keywords, and each document can belong to different topics with different probabilities.

## <font color='blue'>How does it work?

In LDA it's supposed that there are certain topics in the corpus, each of which consists of a set of keywords (each keyword with a certain probability), and each document in the corpus belongs to some of those topics with different probabilities (to be more precise, to all of them, but to some of them with a really negligible probability). The main point about LDA is that LDA supposes that a document is a mixture of topics, and assumes that a document has been created by picking words from the topics, according to words distributions.

\begin{align}
P(word) & = \sum_{k=1}^{K} P(topic) \: P(word \: \mid \: topic_k) \\
\end{align}

The procedure above is done in an iterative manner:
<ol>
1) Assigning documents to topics with different probabilities<br>
2) Assigning words to the topics with different probabilities<br>
3) Checking each word in the document and assigning it to one of the topics, according to the topics and words distributions<br>
</ol>

\begin{align}
P(topic_k \: \mid \: doc) \: P(word \: \mid \: topic_k)
\end{align}

The first thing to consider is that the initialization will be done randomly. The procedure above will be repeated until convergence or reaching some stopping criteria.

## <font color='blue'>The model

The picture below shows the procedure above:

<img src="first.jpg">

$Z$ = Cluster of the word

$W$ = The word

$\alpha$ = The distribution of topics in the document

$\eta$ = The distribution of keywords in a topic

$\theta$ = Topics in the document

$\beta$ = Keywords in a topic

__α__ and __η__ are still unknown. We assume that those parameters come from two different __*Dirichlet distributions*__. Dirichlet distribution helps us to assume that both topics and keywords distributions are sparse. It means that we have only few topics, and for each topic few keywords.

The model would be:

$Z_{d,n} \mid \theta \sim Multinomial(\theta,d) $

$W_{d,n} \mid Z_{d,n},\beta \sim Multinomial(\beta_{Z_{d,n}}) $

$\beta_k \mid  \eta \sim Dirichlet(\eta) $

$\theta_d \mid  \alpha \sim Dirichlet(\alpha) $

From the diagram above, the only thing we have is the set of words (observable variable), based on which we need to find topics, keywords and words clusters (latent variables). Note that if a word belongs to a cluster (topic), it doesn’t mean that the word is among that topic keywords.

\begin{align}
P(\theta_{1,...,D},\beta_{1,...,K},Z_{1,...,D;1,...,N}) \mid W_{1,...,D;1,...,N})
\end{align}

Where:

$K$ = Number of topics

$N_d$ = Number of words

$D$ = Number of documents

There are some methods to solve above posterior, like sampling methods, which are not in the scope of this tutorial.