# 朴素贝叶斯法

### 朴素贝叶斯通过训练数据集学习联合概率分布$P(X, Y)$.具体学习先验概率$P(Y=c_k)$和条件概率分布$P(X=x|Y = c_k)$。
$$
P(X, Y) = P(Y) \cdot P(X|Y)
$$
先验概率分布
$$
P(Y = c_k), (k = 1, 2, 3, ...., K)
$$
条件概率分布
$$
P(X=x|Y=c_k) = P(X^{(1)} = x^{(1)}, X^{(2)} = x^{(2)}, \cdots, X^{(n)} = x^{(n)} | Y = c_k)
$$
于是便学习到联合概率分布$P(X,Y)$. \
**朴素**贝叶斯对条件概率做了**条件独立性**的**强假设**，也因此得名。
$$
\begin{aligned}
P(X=x|Y=c_k) &= P(X^{(1)} = x^{(1)}, X^{(2)} = x^{(2)}, \cdots, X^{(n)} = x^{(n)} | Y = c_k)  \\ &= \prod _{j=1}^n P(X^{(j)} = x^{(j)}| Y = c_k)
\end{aligned}
$$

#### 朴素贝叶斯法分类时，通常给定输入$x$，通过学习到的模型计算**后验概率**$P(Y=c_k|X=x)$,后验概率根据贝叶斯定理得：
$$
\begin{aligned}
P(Y=c_k|X = x) &= \frac {P(X = x | Y = c_k)P(Y=c_k)} {\sum_k P(X=x|Y=c_k)P(Y=c_k)} \\
&= \frac {P(Y=c_k) \prod_j P(X^{(j)} = x{(j)}| Y = c_k)} {\sum_k P(Y=c_k) \prod_{j} P(X^{(j)} = x{(j)}| Y = c_k)}
\end{aligned}
$$
**朴素贝叶斯分类器**可表示为
$$
y = f(x) = arg \max_{c_k} \frac {P(Y=c_k) \prod_j P(X^{(j)} = x{(j)}| Y = c_k)} {\sum_k P(Y=c_k) \prod_{j} P(X^{(j)} = x{(j)}| Y = c_k)} 
$$

### 贝叶斯估计
条件概率分布$P(X=x|Y=c_k)$有指数级数量的参数，其估计实际是不可行的。事实上，驾驶$x^{(j)}$可能取值有$S_j$个，$j=1,2,\cdots,n$,Y的可能取值有$K$个，那么参数个数为$K \prod_{j=1}^n S_j$。 \
条件概率的贝叶斯估计
$$
P_{\lambda}(X^{(j)=a_{jl}}| Y = c_k)= \frac {\sum_{i=1}^N I(x_i^{(j)}=a_{il}, y_i = c_k) + \lambda} {\sum_{i=1}^N I(y_i = c_k) + S_j \lambda}
$$
通常对$\lambda=1$,称为**拉普拉斯平滑**

### 后验概率最大化
**假设朴素贝叶斯分类器选择0-1损失函数**
$$
L(Y, f(X))=
\begin{cases}
1, &\mbox{Y $\not=$ f(X)} \\
0, &\mbox{Y = f(X)} \\
\end{cases}
$$
期望风险函数为：
$$
R_exp(f) = E(L(Y, f(x))) = E_X \sum_{k=1}^K [L(c_k, f(X))] P(c_k|X) \\
\begin{aligned}
f(x) &= arg \min_{y \in \mathrm{y}} \sum_{k=1}^K [L(c_k, f(X))] P(c_k|X) \\
&= arg \min_{y \in \mathrm{y}} \sum_{k=1}^K \{ [L(c_k, f(X))=1]P(y \not = c_k|X=x) + [L(c_k, f(X))=0]P(y = c_k|X=x) \} \\
&= arg \min_{y \in \mathrm{y}} \sum_{k=1}^K P(y \not = c_k|X=x) \\
&= arg \min_{y \in \mathrm{y}} (1 - P(y = c_k|X=x)) \\
&= arg \max_{y \in \mathrm{y}} P(y = c_k|X=x)
\end{aligned} \\
f(x) = arg \max_{y \in \mathrm{y}} P(c_k|X=x)
$$

### 算法4.1朴素贝叶斯算法(naive Bayes algorithm)
输入：训练数据$T=\{ (x_1, y_1), (x_2, y_2), \cdots, (x_N, y_N) \}$, 其中$x_i = (x_i^{(1)}, x_i^{(2)},\cdots, x_i^{(n)})^T$, $x_i^{(j)}$是第i个样本的第j个特征，$x_i^{(j)} \in {a_{j1}, a_{j2}, \cdots, a_{jS_j}}$, $a_{jl}$是第j个特征可能取的第$l$个值，$j=1,2,\cdots,n, l=1,2,\cdots, S_j, y_i \in \{ c_1, c_2, \cdots, c_K\}$; 实例$x$  \
输出：实例$x$的分类 \
(1) 计算先验概率和条件概率 \
$$
P(Y=c_k) = \frac {\sum_{i=1}^N I(y_i=c_k)} {N}, k= 1,2, \cdots, K \\
P(X^{(j)}=a_{jl} | Y = c_k) = \frac {\sum_{i=1}^N I(x_i^{(j)} = a_{jl}, y_i=c_k)} {\sum_{i=1}^N I(y_i=c_k)} \\
j=1,2,\cdots,n, l=1,2,\cdots, S_j
$$
(2)对于给定的实例$x= (x^{(1)}, x^{(2)},\cdots, x^{(n)})^T$,计算
$$
P(Y=c_k) \prod_{j=1}^{n} P(X^{(j)} = x^{(j)} | Y = c_k), k = 1, 2, \cdots, K
$$
(3)确定实例x的类
$$
y = arg\max_{c_k} P(Y=c_k) \prod_{j=1}^n P(X^{(j)} = x^{(j)}  | Y = c_k)
$$
编程时会进行$\log$运算，对于784个特征(0～1)相乘防止下溢出，而且还可以将相乘变成累加，简化计算
$$
\log P(Y=c_k) \prod_{j=1}^n P(X^{(j)} = x^{(j)}  | Y = c_k) \\ = \log  P(Y=c_k) + \sum_{j=1}^n \log P(X^{(j)} = x^{(j)}  | Y = c_k)
$$

## 例4.1 训练一个朴素贝叶斯分类器，确定$x = (2, S)^T $的类标记$y$。表中$X^{(1)}$和$X^{(2)}$为特征，取值的集合为$A_1 = 1, 2, 3, A_2 = S, M, L,$，Y为类标记，$Y \in C = \{1, -1\}$

|var|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
|$x^{(1)}$|1|1|1|1|1|2|2|2|2|2|3|3|3|3|3|
|$x^{(2)}$|S|M|M|S|S|S|M|M|L|L|L|M|M|L|L|
|$Y$|-1|-1|1|1|-1|-1|-1|1|1|1|1|1|1|1|-1|

In [3]:
import numpy as np


In [13]:
def NaiveBayes_train(X_train, y_train):
    X_train = X_train
    y_train = y_train
    
    # 拉普拉斯平滑
    lambda_ = 1
    
    # label
    classType = np.unique(y_train)
    # 
    classNumber = len(classType)
    
    # 初始化先验概率存放数组
    Py = np.zeros((classNumber, 1))
    # 对每个类别遍历
    for i, labeli in enumerate(classType):
        Py[i] = (np.sum(y_train == labeli) + 1) / (len(y_train) + classNumber * 1)
    
    Py = np.log(Py)
    
    # 特征维数
    featureNumber = 2
    
    # 计算条件概率分布
    Px_y = np.zeros((classNumber, featureNumber, 3))
    # 遍历
    for i in range(len(y_train)):
        label = y_train[i]
        # labeli = 0(-1) or 1
        labeli = 0 if (label == -1) else label
        # 获取当前样本
        x = X_train[i]
        # 遍历样本每一维度
        for j in range(featureNumber):
            # 
            if x[j] == 'S':
                temp = 1
            elif x[j] == 'M':
                temp = 2
            elif x[j] == 'L':
                temp = 3
            else:
                temp = x[j]
                
            temp = int(temp)
            
            Px_y[labeli][j][temp - 1] += 1
        
    for label in range(classNumber):
        for j in range(featureNumber):
            Px_y1 = Px_y[label][j][0]
            Px_y2 = Px_y[label][j][1]
            Px_y3 = Px_y[label][j][2]

            Px_y[label][j][0] = np.log((Px_y1 + 1) / (Px_y1 + Px_y2 + Px_y3 + 3))
            Px_y[label][j][1] = np.log((Px_y2 + 1) / (Px_y1 + Px_y2 + Px_y3 + 3))
            Px_y[label][j][2] = np.log((Px_y3 + 1) / (Px_y1 + Px_y2 + Px_y3 + 3))
                
    return Py, Px_y

def Predict(Py, Px_y, x):
    featureNumber = 2
    classNumber = 2
    
    P = [0] * classNumber
    
    for i in range(classNumber):
        print("i = ", i)
        sum_ = 0
        for j in range(featureNumber):
            if x[j] == 'S':
                temp = 1
            elif x[j] == 'M' :
                temp = 2
            elif x[j] == 'L':
                temp = 3
            else:
                temp = x[j]
                
            temp = int(temp)
            
            sum_ += Px_y[i][j][temp - 1]
            print("sum = ", sum_)
            
        P[i] = sum_ + Py[i]
        print(Py[i])
        print("P---------------", P[i])
        
    return P.index(max(P))


In [14]:
Xtrain = np.array([
    [1, 'S'],
    [1, 'M'],
    [1, 'M'],
    [1, 'S'],
    [1, 'S'],
    [2, 'S'],
    [2, 'M'],
    [2, 'M'],
    [2, 'L'],
    [2, 'L'],
    [3, 'L'],
    [3, 'M'],
    [3, 'M'],
    [3, 'L'],
    [3, 'L']
])

y_train = np.array([-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1])

Py, Px_y = NaiveBayes_train(X_train= Xtrain, y_train= y_train)
print(Py, " \n===========\n ", Px_y)


[[-0.8873032 ]
 [-0.53062825]]  
  [[[-0.81093022 -1.09861229 -1.5040774 ]
  [-0.81093022 -1.09861229 -1.5040774 ]]

 [[-1.38629436 -1.09861229 -0.87546874]
  [-1.79175947 -0.87546874 -0.87546874]]]


In [15]:
X_new = np.array([[2, 'S']])
Y_new = np.array([[-1]])

errorCount = 0
for i in range(len(X_new)):
    pred = Predict(Py, Px_y, X_new[i])
    print(pred)
    testlabeli = 0 if (Y_new[i] == -1) else Y_new[i]
    print(testlabeli)
    if pred != testlabeli:
        errorCount += 1

acc = 1 - (errorCount / len(X_new))
print("Accuracy: ", acc)

i =  0
sum =  -1.0986122886681098
sum =  -1.9095425048844386
[-0.8873032]
P--------------- [-2.7968457]
i =  1
sum =  -1.0986122886681098
sum =  -2.8903717578961645
[-0.53062825]
P--------------- [-3.42100001]
0
0
Accuracy:  1.0
