<h1>HMM & CRF</h1>

# 0. HMM

线性回归、逻辑回归假设输入样本的变量（特征）之间是相互独立的，朴素贝叶斯假设特征之间是条件独立的，在现实中，提前到的特征之间往往有千丝万缕的联系，按照这些朴素的假设建模会损失很多信息。比如图像（按照像素表示）的相邻像素之间比较相似，一篇文章相邻的单词、句子、段落之间也有一定的时间序列上的关联性。

为了建模变量之间的关联性，衍生了概率图模型。

按照顶点之间的边是否有向，概率图模型分为有向概率图模型（如贝叶斯网络）和无向概率图模型（如条件随机场）。如果顶点之间关联的边特别多（如全连通图），会给计算带来特别大的麻烦。所以为了降低计算复杂度，会尽量选择结合物理意义和物理假设，建立条件独立关系、最大子团关系等来减少不必要的连通边。

## 0.0 HMM定义

隐马尔科夫模型(Hidden Markov Models)建模时间序列，假设有隐含变量按照一定的概率来生成特征变量，并且隐含变量之间也以一定的概率相互转移。

$N$个输出节点，$N$个隐含节点的HMM包含隐含状态$K$个隐含状态$z_n=s_k$, $M$个输出状态$x_n=o_m$；

> 隐含状态的初始概率分布为$\pi$；

> 隐含状态之间的状态转移概率矩阵为$A_{ij} = p(z_{n}=s_j|z_{n-1} = s_i)$；

> 隐含状态到输出之间的输出观测概率矩阵为$B_{ij} = p(o_{n} = x_j|z_{n} = s_i)$。


完全数据(X, Z)的联合概率分布为，

> $p(X, Z|\lambda) = p(z_1)p(x_1|z_1) \prod_{n=2}^N p(x_n|z_n) \prod_{n=2}^N p(z_n|z_{n-1})$

## 0.1 HMM问题

+ Likelihood

> 给定HMM模型$\lambda=[\pi, A, B]$和观测序列$X = \{x_1, x_2, ..., x_n\}$，求该序列的概率$p(X|\lambda)$;

+ Decode

> 给定HMM模型$\lambda=[\pi, A, B]$和观测序列$X = \{x_1, x_2, ..., x_n\}$，给出最有可能的隐含状态序列$Z = \{z_1, z_2, ..., z_n\}$;

+ Learning

> 给定观测序列$X = \{x_1, x_2, ..., x_n\}$，如何调整模型参数$\lambda = [\pi, A, B]$，使得该序列出现的概率$p(X|\lambda)$最大？


## 0.3 Decode问题求解（Viterbi）
## 0.4 Learning问题求解 （EM）
包含隐含变量的参数求解问题，非常适合使用EM算法。

> 观测数据: $X = \{x_1, x_2, ..., x_n\}$

> 隐含变量: $Z = \{z_1, z_2, ..., z_n\}$

> 参数: $\lambda = [\pi, A, B]$

+ E-步

考虑Q函数，计算完全数据的log似然关于隐变量的期望$\mathbb{E}_{Z}[\log p(X, Z|\theta)]$，
> $Q(\theta, \theta^{old}) = \sum_{Z} p(Z|\theta^{old}) \log p(X, Z|\theta)$

## 0.2 Likelihood问题求解（forward, backward）

> $p(X|\lambda) = \sum_{z}^Z p(X, z|\lambda)$





+ forward

令 $\alpha_{t,k} = p(x_1, x_2, ..., x_t, z_t = s_k|\lambda)$，输入是$x_1, x_2, ..., x_t$，当前隐含状态是状态$k$, $z_t = s_k$，

> $\alpha_{1,k} = p(x_1, z_1 = s_k|\lambda) = \pi_k b_{k,x_1}$

> $\alpha_{t+1,k} = p(x_1, x_2, ..., x_t, x_{t+1}, z_{t+1} = s_k|\lambda) $

> $ = \sum_{i=1}^K p(x_1, x_2, ..., x_t, z_t = s_i, x_{t+1}, z_{t+1} = s_k; \lambda)$

> $ = \sum_{i=1}^K p(x_1, x_2, ..., x_t|z_t = s_i, x_{t+1}, z_{t+1} = s_k;\lambda) * p(z_t=s_i, x_{t+1}, z_{t+1}=s_k)$

> $ = \sum_{i=1}^K p(x_1, x_2, ..., x_t|z_t = s_i; \lambda) * p(x_{t+1}, z_{t+1} = s_k | z_t = s_i) p(z_t = s_i)$

> $ = \sum_{i=1}^K p(x_1, x_2, ..., x_t|z_t = s_i; \lambda) p(z_t = s_i) * p(x_{t+1}, z_{t+1} = s_k | z_t = s_i)$

> $ = \sum_{i=1}^K p(x_1, x_2, ..., x_t, z_t = s_i|\lambda) * p(x_{t+1}|z_{t+1} = s_k, z_t = s_i) p(z_{t+1} = s_k | z_t = s_i)$

> $ = \sum_{i=1}^K \alpha_{t,i} * p(x_{t+1}|z_{t+1} = s_k) a_{i,k}$

> $ = \left(\sum_{i=1}^K \alpha_{t,i} a_{i, k}\right) b_{k,x_{t+1}}$

> $p(x_1, x_2, ..., x_N|\lambda) = \sum_{i=1}^K \alpha_{N, i}$

+ backward

令$\beta_{t,k} = p(x_{t+1}, x_{t+2}, ..., x_{N} | z_{t} = s_k; \lambda)$，表示输入是$x_{t+1}, x_{t+2}, ..., x_N$，当前隐含状态是$k$，$z_{t}=s_k$，

初始化，
> $\beta_{N,k} = 1$

迭代，$t = N-1, N-2, ..., 1$，
> $\beta_{t, k} = p(x_{t+1},x_{t+2},...,x_N|z_t = s_k; \lambda)$

> $ = \sum_{i=1}^K p(x_{t+1}, x_{t+2}, ..., x_N, z_{t+1} = i | z_t = s_k; \lambda)$

> $ = \sum_{i=1}^K p(x_{t+2}, x_{t+3}, ..., x_N|x_{t+1}, z_t = s_k, z_{t+1} = i; \lambda)
* p(x_{t+1}, z_{t+1} = i|z_t = s_k; \lambda)$

> $ = \sum_{i=1}^K p(x_{t+2}, x_{t+3}, ..., x_N|z_{t+1} = i; \lambda) * p(x_{t+1}|z_{t+1} = i, z_t = s_k; \lambda) p (z_{t+1} = i|z_t = s_k; \lambda)$

> $ = \sum_{i=1}^K \beta_{t+1, i} * p(x_{t+1}|z_{t+1} = i; \lambda) a_{k, i}$ 

> $ = \sum_{i=1}^K \beta_{t+1, i} b_{i, x_{t+1}}  a_{k, i}$

终止条件，

> $ p(x_1, x_2, ..., x_N|\lambda) = $

> $ \sum_{i=1}^K p(x_1, x_2, x_3, ..., x_N, z_1 = s_i|\lambda) 
= \sum_{i=1}^K p(x_2, x_3, ..., x_N|x_1, z_1 = s_i; \lambda) p(x_1, z_1 = s_i|\lambda)$

> $ = \sum_{i=1}^K p(x_2, x_3, ..., x_N | z_1 = s_i; \lambda) p(x_1 | z_1 = s_i; \lambda) p(z_1 = s_i | \lambda)$

> $ = \sum_{i=1}^K \beta_{1,i} b_{i,x_1} \pi_{i}$