<span style="color:green;font-size:xx-large">How LDA works</span>

In [2]:
#Example documents
d1 = ("dog", "cat", "rat", "ate", "the", "cat")
d2 = ("dog", "chair", "table", "and", "the", "chair")
d3 = ("dog", "cat", "chair", "chased", "the", "cat")


In [3]:
#Get the vocabulary
vocab = list(set(d1+d2+d3))
list(enumerate(vocab))

[(0, 'chased'),
 (1, 'table'),
 (2, 'cat'),
 (3, 'chair'),
 (4, 'dog'),
 (5, 'ate'),
 (6, 'rat'),
 (7, 'the'),
 (8, 'and')]

In [None]:
#We don't care what the vocab is, we'll just focus on the number 9 
#

<span style="color:green;font-size:large">First, allocate topics to documents at random</span>

In [4]:
#randomly allocate words to topics to documents
import numpy as np
np.random.seed(42)
topics = np.random.randint(2,size=(9, 3))
topics


array([[0, 1, 0],
       [0, 0, 1],
       [0, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 1],
       [1, 0, 1],
       [0, 1, 1],
       [1, 1, 1]])

<table>
    <tr><th>word</th><th>doc 1</th><th>doc 2</th><th>doc 3</th></tr>
    <tr><td>chair</td><td>T0</td><td>T1</td><td>T0</td></tr>
    <tr><td>ate</td><td>T0</td><td>T0</td><td>T1</td></tr>
    <tr><td>the</td><td>T0</td><td>T0</td><td>T0</td></tr>
    <tr><td>and</td><td>T1</td><td>T0</td><td>T0</td></tr>
    <tr><td>rat</td><td>T0</td><td>T0</td><td>T1</td></tr>
    <tr><td>dog</td><td>T0</td><td>T1</td><td>T1</td></tr>
    <tr><td>cat</td><td>T1</td><td>T0</td><td>T1</td></tr>
    <tr><td>chased</td><td>T0</td><td>T1</td><td>T1</td></tr>
    <tr><td>table</td><td>T1</td><td>T1</td><td>T1</td></tr>
</table>
        

<span style="color:green;font-size:large">Compute probabilities</span>
<p></p>
<li><span style="color:red">p_wj_ti</span> is the probability that word j belongs to topic i. This is computed by counting the number of documents a (word,topic) combination belongs to and dividing by the total number of documents</li>
<li>Example: chair and topic 1 belong to 1 document, therefore the probability is 1/3 or 0.33</li>
<li>we can use numpy's sum function to compute this for all probabilities</li>
<li>Since there are two topics, compute the probabilities for topic 1 and use that to get the probabilities for topic 0</li>

In [5]:
#Compute probabilities
#Prob that word belongs to topic
#there are 3 documents, 2 topics
#get the row sum and divide by 3 (p(w|t1))
p_w_t1 = topics.sum(axis=1)/3
p_w_t1

array([0.33333333, 0.33333333, 0.        , 0.33333333, 0.33333333,
       0.66666667, 0.66666667, 0.66666667, 1.        ])

In [6]:
#1-p(w|t1) will be p(w|t0)
p_w_t0 = 1-p_w_t1
p_w_t0

array([0.66666667, 0.66666667, 1.        , 0.66666667, 0.66666667,
       0.33333333, 0.33333333, 0.33333333, 0.        ])

In [None]:
p_w_t = np.vstack((p_w_t0,p_w_t1))
p_w_t

<span style="color:green;font-size:large">Compute probabilities</span>
<p></p>
<li><span style="color:red">p_tj_di</span> is the probability that topic j belongs to document i. This is computed by counting the number of (word,topic_j) associated with a document and dividing by the number of words</li>
<li>Example: Topic 1 is associated with three words in document 1, therefore the probability that topic 1 belongs to document 1 is 3/9 or 0.33</li>


In [None]:
#Get p(t|d) the probability that a topic is associated with a document
#get the column sum and divide by 9
p_t1_d = topics.sum(axis=0)/9
p_t1_d

In [None]:
#For topic 0
p_t0_d = 1-p_t1_d
p_t0_d

In [None]:
p_t_d = np.vstack((p_t0_d,p_t1_d))
p_t_d

<span style="color:green;font-size:x-large">update probabilities</span>
<li>By changing word-topic-document assignment</li>
<li>For each word, document (w,d) combination, compute:</li>
<ul>
    <li>a = p(t_i,d)*p(w,t_i)</li>
    <li>b = p(t_j,d)*p(w,t_j)</li>
</ul>
<li>if $a < b$, change t_i to t_j</li>

<li>Example: </li>
<ul><li>p(t0,d2) = 0.55555; p(w6,t0) = 0.3333; a = 0.55555*0.33333</li>
<li>p(t1,d2) = 0.4444; p(w6,t1) = 0.666667; b = 0.44444*0.666667</li>
<li>Since $a<b$ change the assignment of w6,d2 from 1 to 0</li>
    </ul>

<span style="color:green;font-size:x-large">compute all a's and b's</span>
<li>Note that we need to add an axis to one of the matrices</li>

In [None]:
#Compute all a's and b's
a_and_b = np.multiply(p_t_d.transpose().reshape(3,2,1),p_w_t)
a_and_b

<span style="color:green;font-size:x-large">Extract t0 and t1 products</span>
<li>a_and_b is a 3-d matrix. Extract two 2-d matrices from it</li>

In [None]:
t_0_prods = a_and_b[:,0,:]
t_1_prods = a_and_b[:,1,:]

In [None]:
t_0_prods

<span style="color:green;font-size:x-large">Figure out which topics to switch</span>
<li>Wherever t_0+prods is less than t_1_prods, the topic needs to be switched</li>
<li>Note that the actual algorithm works a little differently since we're just doing a single comparison</li>
<li>If we went through the matrices serially, the results will be different because the topics, and consequently the probabilities, will change</li>
<li>Also, there won't conveniently be two topics!</li>

In [None]:
switch = t_0_prods < t_1_prods

<span style="color:green;font-size:x-large">switch the topics</span>
<li>We can use np.where for this</li>

In [None]:
switch.transpose()

In [None]:
topics

In [None]:
st = switch.transpose()
btopics = topics.astype(bool)
btopics

In [None]:
new_topics = np.where(st, ~btopics,btopics).astype(int)
new_topics

<span style="color:green;font-size:x-large">Calculate new probabilites and repeat</span>


In [None]:
p_w_t1_new = new_topics.sum(axis=1)/3
p_w_t0_new = 1-p_w_t1_new
p_t1_d_new = new_topics.sum(axis=0)/9
p_t0_d_new = 1- p_t1_d_new
p_t0_d_new, p_t1_d_new

In [None]:
p_w_t1_new,p_w_t0_new

In [None]:
p_t0_d_new, p_t1_d_new

<span style="color:green;font-size:xx-large">Why "sort of"</span>
<p></p>
<li>The eagle eyed amongst you must have noticed that <b>we are assuming all words are in all documents</b>. No document specific information is being used</li>
<li>In practice, the first step, i.e., the random topic assignment to word-document combinations, is only done on words that are in the document</li>
<li>This makes the code a lot more complicated since the topics array may have different number of rows for each document</li>
<li>Also, a lot of the nifty matrix manipulation we've been doing will no longer be possible</li>
<li>Finally, there are other dirichlet parameters that need to be taken into consideration</li>
<li>A more complete example is in the second notebook but you'll need to read the paper to understand it</li>