# LDA算法系列2——Gibbs采样

这一节我们将用Gibbs采样来从参数的后验分布进行采样，并估计LDA模型的参数。  

回顾上一节，LDA算法的工作流程是


>for $k$ in $1,2,...,K$:  
&emsp;$\beta_k\sim DIR(\eta,...,\eta)$  
for $d$ in $1,2,...,D$:  
&emsp;$\theta_d \sim DIR(\alpha,...,\alpha)$  
&emsp;for $n$ in $1,2,...,N$:  
&emsp;&emsp;$Z_{d,n}\sim Multi(\theta_d)$  
&emsp;&emsp;$W_{d,n}\sim Multi(\beta_{Z_{d,n}})$

LDA模型有三组参数：

* 主题概率：$\{\beta_k\}_{k=1}^K$，其中$\beta_k$是$V$维的向量，$V$是词表大小
* 文档主题：$\{\theta_d\}_{d=1}^D$
* 隐变量：$\{z_{d,n}\},(d\in\{1,...,D\},n\in\{1,...,N\})$

## $z_{m,n}$的参数估计

我们先来考察生成语料库的概率，即$\vec{w},\vec{z}$的联合概率分布$p(\vec{w},\vec{z}|\alpha,\eta)$

$$\begin{aligned}
p(\vec{w},\vec{z}|\alpha,\eta)&=p(\vec{w}|\vec{z},\eta)p(\vec{z}|\alpha)\\&=
\int p(\vec{w}|\vec{z},B)p(B|\eta)dB \cdot \int p(\vec{z}|\Theta)p(\Theta|\alpha)d\Theta\\&=
\int \prod_{d=1}^D\prod_{n=1}^N p(w_{d,n}|\vec{\beta}_{z_{d,n}})\cdot \prod_{k=1}^K p(\vec{\beta}_k|\eta)dB\\&\cdot\int\prod_{d=1}^D\big(\prod_{n=1}^N p(z_{d,n}|\vec{\theta}_d)p(\vec{\theta}_d|\alpha)\big)d\Theta\\&=
\int\prod_{k=1}^K\prod_{v=1}^V \beta_{k,v}^{N_v^{(k)}}\prod_{k=1}^K \big( \frac{\Gamma(\sum_{v=1}^V \eta_v)}{\prod_{v=1}^V \Gamma(\eta_v)}\prod_{v=1}^V \beta_{k,v}^{\eta_v-1} \big)dB\\&
\cdot \int \prod_{d=1}^D\prod_{k=1}^K \theta_{d,k}^{N_k^{(d)}}\prod_{d=1}^D\big(\frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)}\prod_{k=1}^K\theta_{d,k}^{\alpha_k-1}\big) d\Theta \\&=
\big(\frac{\Gamma(\sum_{v=1}^V \eta_v)}{\prod_{v=1}^V \Gamma(\eta_v)})^K\big(\frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)}\big)^D\\&\cdot
\int \prod_{k=1}^K\prod_{v=1}^V \beta_{k,v}^{N_v^{(k)}+\eta_v-1}dB\int\prod_{d=1}^D\prod_{k=1}^K\theta_{d,k}^{N_k^{(d)}+\alpha_k-1}d\Theta\\&=\big(\frac{\Gamma(\sum_{v=1}^V \eta_v)}{\prod_{v=1}^V \Gamma(\eta_v)})^K\big(\frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)}\big)^D\\&\cdot 
\prod_{k=1}^K\int \prod_{v=1}^V \beta_{k,v}^{N_v^{(k)}+\eta_v-1}dB\cdot \prod_{d=1}^D\int\prod_{k=1}^K\theta_{d,k}^{N_k^{(d)}+\alpha_k-1}d\Theta \\&=
\big(\frac{\Gamma(\sum_{v=1}^V \eta_v)}{\prod_{v=1}^V \Gamma(\eta_v)})^K\big(\frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)}\big)^D\\&\cdot
\prod_{k=1}^K \frac{\prod_{v=1}^V \Gamma(N_v^{(k)}+\eta_v)}{\Gamma(\sum_{v=1}^V (N_v^{(k)}+\eta_v))}\cdot \prod_{d=1}^D\frac{\prod_{k=1}^K \Gamma(N_k^{(d)}+\alpha_k)}{\Gamma(\sum_{k=1}^K (N_k^{(d)}+\alpha_k))} 
\end{aligned}$$

其中$B=\{\beta_1,...,\beta_K\}$，$\Theta=\{\theta_1,...,\theta_D\}$,$N_k^{(d)}$表示第$d$篇文档中属于主题$k$的单词数，$N_k^{(d)}=\sum_{n=1}^N \mathbb{1}(z_{d,n}=k)$, $N_v^{(k)}$表示第$v$个词在主题$k$中的出现次数  


上面公式的结果太冗长，可以做个简化
$$p(\vec{w},\vec{z}|\alpha,\eta)= \prod_{k=1}^K \frac{\Delta(\vec{N}^{(k)}+\vec{\eta})}{\Delta(\vec{\eta})}\cdot
\prod_{d=1}^D\frac{\Delta(\vec{N}^{(d)}+\vec{\alpha})}{\Delta(\vec{\alpha})}$$

其中
* $\vec{N}^{(k)}=\{N_1^{(k)},...,N_V^{(k)}\}$，$N_v^{(k)}$表示第$v$个词在主题$k$中的出现次数 
* $\vec{N}^{(d)}=\{N_1^{(d)},...,N_K^{(d)}\}$，$N_k^{(d)}$表示第$d$篇文档中属于主题$k$的单词数
* 记$\Delta(\alpha)=\frac{\prod_{k=1}^K\Gamma(\alpha_k)}{\Gamma(\sum_{k=1}^K\alpha_k)}$


## Collapsed Gibbs Sampling

接下来我们用Collapsed Gibbs采样算法来推导$z_{m,n}$的参数估计。记$\vec{z}_{\neg(m,n)}$表示$\vec{z}$去掉$z_{m,n}$后剩余隐变量构成的集合，并假设$z_{m,n}$属于主题$i$。  
首先我们固定$\vec{z}_{\neg(m,n)}$，计算完全条件概率  
$$\begin{aligned}p(z_{m,n}|\vec{w},\vec{z}_{\neg(m,n)})&=
\frac{p(\vec{z},\vec{w})}{p(\vec{w},\vec{z}_{\neg(m,n)})}\\&=
\bigg(\prod_{k=1}^K \frac{\Delta(\vec{N}^{(k)}+\vec{\eta})}{\Delta(\vec{\eta})}\cdot
\prod_{d=1}^D\frac{\Delta(\vec{N}^{(d)}+\vec{\alpha})}{\Delta(\vec{\alpha})}\bigg) \bigg/ \bigg(\prod_{k=1}^K \frac{\Delta(\vec{N}_{\neg(m,n)}^{(k)}+\vec{\eta})}{\Delta(\vec{\eta})}\cdot
\prod_{d=1}^D\frac{\Delta(\vec{N}_{\neg(m,n)}^{(d)}+\vec{\alpha})}{\Delta(\vec{\alpha})}\bigg)\\&=
\frac{\Delta(\vec{N}^{(i)}+\vec{\eta})}{\Delta(\vec{N}_{\neg(m,n)}^{(i)}+\vec{\eta})}\cdot \frac{\Delta(\vec{N}^{(m)}+\vec{\alpha})}{\Delta(\vec{N}_{\neg(m,n)}^{(m)}+\vec{\alpha})}
\end{aligned}$$

对上面公式的推导过程做一下说明：

* 当$k\neq i$时，去掉$z_{m,n}$不会对除去主题$i$外的其他主题造成影响，因此
$$\Delta(\vec{N}^{(k)}+\vec{\eta})=\Delta(\vec{N}_{\neg(m,n)}^{(k)}+\vec{\eta}), \quad   (k\neq i)$$

* 同理，当$d\neq m$时，去掉$z_{m,n}$不会对第$m$篇文章外的其他文章造成影响，于是
$$\Delta(\vec{N}^{(d)}+\vec{\alpha})=\Delta(\vec{N}_{\neg(m,n)}^{(d)}+\vec{\alpha}), \quad (d\neq m)$$

因此，当$k\neq i$或$d\neq m$时，分子分母均相等，于是连乘符号$\prod$全部抵消


接下来考察第一项的分子：
$$\Delta(\vec{N}^{(i)}+\vec{\eta})=\frac{\prod_{v=1}^V \Gamma(N_v^{(i)}+\eta_v)}{\Gamma(\sum_{v=1}^V (N_v^{(i)}+\eta_v))}=\frac{\Gamma(N_{1}^{(i)}+\eta_1)\Gamma(N_{2}^{(i)}+\eta_2)...\Gamma(N_{V}^{(i)}+\eta_{V})}{\Gamma(\sum_{v=1}^V (N_v^{(i)}+\eta_v))}$$

第一项的分母：
$$\Delta(\vec{N}_{\neg(m,n)}^{(i)}+\vec{\eta})=\frac{\prod_{v=1}^V \Gamma(N_{v,\neg (m,n)}^{(i)}+\eta_v)}{\Gamma(\sum_{v=1}^V (N_{v,\neg (m,n)}^{(i)}+\eta_v))}=\frac{\Gamma(N_{1}^{(i)}+\eta_1)\Gamma(N_{2}^{(i)}+\eta_2)...\Gamma(N_{j-1}^{(i)}+\eta_{j-1})\Gamma(N_{j}^{(i)}-1+\eta_{j})\Gamma(N_{j+1}^{(i)}+\eta_{j+1})...\Gamma(N_{V}^{(i)}+\eta_{V})}{\Gamma(\sum_{v=1}^V (N_{v}^{(i)}+\eta_v)-1)}$$

这里$j$表示$w_{m,n}$对应词表中第$j$个词，$i$是$w_{m,n}$隶属的主题，$N^{(i)}_{v,\neg (m,n)}$表示去掉$w_{m,n}=j$后隶属于编号为$i$的主题的单词数

我们来考察$N^{(i)}_{v,\neg j}$

* 当$v\neq j$时，$N^{(i)}_{v,\neg (m,n)}=N^{(i)}_v$
* 当$v=j$时，$N^{(i)}_{v,\neg (m,n)}=N^{(i)}_j-1$

于是
$$\begin{aligned}\frac{\Delta(\vec{N}^{(i)}+\vec{\eta})}{\Delta(\vec{N}_{\neg(m,n)}^{(i)}+\vec{\eta})}&=
\frac{\Gamma(N_{1}^{(i)}+\eta_1)\Gamma(N_{2}^{(i)}+\eta_2)...\Gamma(N_{V}^{(i)}+\eta_{V})}{\Gamma(N_{1}^{(i)}+\eta_1)\Gamma(N_{2}^{(i)}+\eta_2)...\Gamma(N_{j-1}^{(i)}+\eta_{j-1})\Gamma(N_{j}^{(i)}-1+\eta_{j})\Gamma(N_{j+1}^{(i)}+\eta_{j+1})...\Gamma(N_{V}^{(i)}+\eta_{V})}\\&\cdot
\frac{\Gamma(\sum_{v=1}^V (N_{v}^{(i)}+\eta_v)-1)}{\Gamma(\sum_{v=1}^V (N_v^{(i)}+\eta_v))}\\&=\frac{\Gamma(N_{j}^{(i)}+\eta_j)}{\Gamma(N_{j}^{(i)}-1+\eta_{j})}\cdot \frac{\Gamma(\sum_{v=1}^V (N_{v}^{(i)}+\eta_v)-1)}{\Gamma(\sum_{v=1}^V (N_v^{(i)}+\eta_v))}\\&=\frac{N_{j}^{(i)}+\eta_{j}-1}{\sum_{v=1}^V (N_v^{(i)}+\eta_v)-1}\end{aligned}$$


考察第二项的分子：
$$\Delta(\vec{N}^{(m)}+\vec{\alpha})=\frac{\prod_{k=1}^K \Gamma(N^{(m)}_k+\alpha_k)}{\Gamma(\sum_{k=1}^K(N^{(m)}_k+\alpha_k))}$$
第二项的分母：
$$\Delta(\vec{N}_{\neg(m,n)}^{(m)}+\vec{\alpha})=\frac{\prod_{k=1}^K \Gamma(N^{(m)}_{k,\neg(m,n)}+\alpha_k)}{\Gamma(\sum_{k=1}^K(N^{(m)}_{k,\neg(m,n)}+\alpha_k))}$$

值得指出的是上式中的$N^{(m)}_{k,\neg(m,n)}$，在$k$不同取值时的变化：

* 当$k\neq i$时，$N^{(m)}_{k,\neg(m,n)}=N^{(m)}_{k}$
* 当$k=i$时，$N^{(m)}_{k,\neg(m,n)}=N^{(m)}_{i}-1$

类似地

$$\frac{\Delta(\vec{N}^{(m)}+\vec{\alpha})}{\Delta(\vec{N}_{\neg(m,n)}^{(m)}+\vec{\alpha})}=\prod_{k=1}^K \frac{\Gamma(N^{(m)}_k+\alpha_k)}{\Gamma(N^{(m)}_{k,\neg(m,n)}+\alpha_k)}\cdot \frac{\Gamma(\sum_{k=1}^K(N^{(m)}_{k,\neg(m,n)}+\alpha_k))}{\Gamma(\sum_{k=1}^K N^{(d)}_k+\alpha_k)}=\frac{N^{(m)}_i+\alpha_i-1}{\sum_{k=1}^K(N^{(m)}_k+\alpha_k)-1}$$

于是
$$p(z_{m,n}|\vec{w},\vec{z}_{\neg(m,n)})=\frac{N_{j}^{(i)}+\eta_{j}-1}{\sum_{v=1}^V (N_v^{(i)}+\eta_v)-1}\cdot \frac{N^{(d)}_i+\alpha_i-1}{\sum_{k=1}^K(N^{(d)}_k+\alpha_k)-1}$$

如果我们引入对称超参数，即$\alpha_1=\alpha_2=...=\alpha_K=\alpha$，$\eta_1=\eta_2=...=\eta_V=\eta$，那么上式可写为

$$p(z_{m,n}=i|\vec{w},\vec{z}_{\neg(m,n)})=\frac{N_{j}^{(i)}+\eta-1}{\sum_{v=1}^V N_v^{(i)}+ V\eta-1}\cdot \frac{N^{(d)}_i+\alpha -1}{\sum_{k=1}^K N^{(d)}_k+K\alpha-1}$$



## $\theta_d$、$\beta_k$的参数估计

上一节中我们已经得出$z_{m,n}$的采样公式，所以在多次循环采样后，我们就能得到文档集中的每个词的主题赋值，接着我们就可以获得$\theta_d$、$\beta_k$的参数估计

在上一篇文章，我们知道$\theta_d,\beta_k$的后验服从Dirichlet分布，那么可以根据Dirichlet分布的性质计算得出它们的参数估计：


$$\begin{aligned}\theta^*_{d,k}&=\mathbb{E}[\theta_{d,k}]\\&=\frac{N_k^{(d)}+\alpha_k}{\sum_{k=1}^K (N_k^{(d)}+\alpha_k)}\\&\text{(引入对称超参数)}\\&=\frac{N_k^{(d)}+\alpha}{\sum_{k=1}^K N_k^{(d)}+K\alpha}\end{aligned}$$

其中$N_k^{(d)}$表示第$d$篇文档中属于主题$k$的单词数，即$N_k^{(d)}=\sum_{n=1}^N \mathbb{1}(z_{d,n}=k)$  


$$\begin{aligned}\beta^*_{k,v}&=\mathbb{E}[\beta_{k,v}]\\&=\frac{N_v^{(k)}+\eta_v}{\sum_{v=1}^V( N_v^{(k)}+\eta_v)}\\&\text{(引入对称超参数)}\\&=\frac{N_v^{(k)}+\eta}{\sum_{v=1}^V N_v^{(k)}+V\eta}\end{aligned}$$

其中$N_v^{(k)}$表示第$v$个词在主题$k$中的出现次数，即$N_v^{(k)}=\sum_{d=1}^D \sum_{n=1}^N \mathbb{1}(w_{d,n}=v)$

当然它们的共轭参数估计也可以根据Gibbs采样求得，但是这么做显然没有意义。

# 参考
1.LDA数学八卦  
2.LDA漫游指南