# Generating/initializing random topics and words

- Collection of basic blogs at bottom https://devo-evo.lab.asu.edu/methods/?q=node/42
- The original Edwin Chen github repo on Sarah Palin: https://github.com/echen/sarah-palin-lda 

<img src="lda-template.png" width="350">

### Examples of how to use the above functions

### Make some random assignments

Topic distribution is $\theta \sim Dir( \alpha )$, so we'd like $P(\theta|\alpha)$. These are sampled once per document; therefore, there are $M$ of them.

In [14]:
alphas = [5,5]
theta = np.random.dirichlet( alphas )
print "The probability distribution of the topics are: "+str(theta)

The probability distribution of the topics are: [ 0.74118048  0.25881952]


Given a topic distribution as specified by $P(\theta|\alpha)$, we'd like to take $N$ samples  from the distribution specified by $\theta$, each of which is called $z_n$. So,

$z_n \sim Multi( \theta )$

Formally, this is $P(z_n | \theta) = \theta_2^{z_n^{(2)}} \cdots \theta_k^{z_n^{(k)}}$ is a multinomial distribution, with parameter $\theta$. 

In [21]:
N = 10
z_n = np.random.multinomial( 1, theta, size=N )
z_n.argmax(axis=1)

array([0, 1, 0, 1, 1, 1, 0, 1, 1, 1])

So, we have two random variables:

1. $\theta \sim Dir(\alpha)$
2. $z_n \sim Multi(\theta)$. (It's actually the Discrete RV, but don't worry about it)

We now know $P( z_n | \theta )$, but, what is $P( \theta | z_n )$? So, if I observe $N$ topic assignments, what's the conditional distribution given those topic assignments? This is what we'd actually like to solve. As it turns out, because of something called *conjugacy* between the Dirichlet Process and the Multinomial, the posterior probability of the $\theta$ parameter is also a Dirichlet process, and it can be specified as:

$\theta |z_n \sim Dir( \alpha + n( z_{1:N} ) )$

where $n(z_{1:N})$ is the number of times we see a word in that document. So, the more words we see, and the more examples we see, the peakier the distribution gets.

In [None]:
theta_given_z = 

Given the topic $z_n$ and an overall word distribution as specified by $\beta$, the word distribution is a multinomial distribution with parameter $\beta$. Here, the parameter $\beta$ is a matrix of size $k \times V$, since $z_n \in [1, k]$, i.e. $k$ topics, and there are $V$ words.

The probability $P( w | z_n, \beta_{1:K} )$ is a proper distribution whose columns and rows both sum to one.

In [None]:
w_t_n = np.random.multinomial( V, betas, size=N )

With the given topic distribution $\theta = [\theta_1, \theta_2, \theta_3, \cdots, \theta_M$], the probability of that the $n^{th}$ word is from topic $k$ is $P( z_n = k | \theta )$, which is simply $\theta_k$. That is to say, the probability that $z_n^{(k)} = 1$, or $P(z_n^{(k)}=1 | \theta, \alpha)$ is simply the parameter $\theta_k$. 

Then, we can write $P( z_n, \theta | \alpha ) = P( z_n | \alpha) P(\theta | \alpha)$. If we are looking over $N$ words and each of them the draws are independent, then we have $P( \theta, z | \alpha ) = \prod_n P( z_n | \alpha ) P(\theta | \alpha ) = P(\theta | \alpha) \prod_n P( z_n | \theta )$.