## 贝叶斯实现

贝叶斯是一个概率模型。

贝叶斯的目标是最小化目标函数
$h(x) = arg min  R(c|x)$
其中R(c|x)是条件风险.

贝叶斯的目标是最小化结构风险，常用的损失函数是0-1损失函数。条件风险变成：$R(c|x) = 1 - p(c|x)$,最终目标函数变成：$h(x) = argmax P(c|x)$，其中P(c|x)是类别概率最大的概率。


### 判别模型和生成模型

判别模型：直接对P(c|x)进行建模  ->> 决策树，BP神经网络，SVM

生成模型：学习联合概率P(c,x),然后的得到P(c|x)。 ->> 贝叶斯模型


### 生成模型

生成模型需要考虑：$P(c|x) = P(c, x)/P(x)$, 贝叶斯公式 $P(c|x) = P(x|c)*P(c)/P(x)$, 其中P(x)对全部的都一样，p(c)是类别的概率，先验概率，P(x|c)是条件概率。对属性x来说，如果有N个，则会变成：$p(x1, x2, ..., xn|C)$，如果属性之间有关系，则会生成N!个结果。


### 极大似然估计

对条件概率的学习一般会假定有一个概率分别形式，然后利用数据对参数进行估计。对于P(x|c)被唯一的参数$theta$确定，就是利用数据对P(x|theta)进行学习。概率模型的就是对参数进行学习。极大似然估计就是利用对数据进行抽样估计概率分布。

似然函数$p(D_c|\theta_c) = product(P(x|\theta_c)$,其他$D_c$是c类样本的集合，目标对$\theta_c$进行学习。连乘会下溢，使用对数似然：$LL(\theta_c) = logP(D_c|\theta_c) = \sum(log(P(D_c|\theta_c)))$


### 朴素贝叶斯

朴素贝叶斯假定就是全部的属性都是独立的，就不需要对不同的属性之间的概率进行计算，简化的计算。


start to implement Naive Bayes.

In [5]:
import numpy as np
from collections import Counter


data = np.array([[1, 2, 1, 2, 1], [1.2, 2.4, 6.2, 9.8, 2.2]])
label = np.array([0, 1, 1, 0, 1])


In [7]:

# first to compute class probability
counter = Counter(label)

class_prob = {}

for k, v in counter.items():
    class_prob[k] = v/ len(label)
class_prob

{0: 0.4, 1: 0.6}

In [21]:
# next is to loop for each feature and based on each class to get prob.

# based on feature type: category or contineous.

# category
def _get_cate(fea, label_counter):
    cate_prob = {}
    
    for k in label_counter.keys():
        k_data = fea[label == k]
        
        if k not in cate_prob:
            cate_prob[k] = {}
        
        # get unique type of feature.
        unique_k = Counter(k_data)
        for uk, uv in unique_k.items():
            prob_uk = uv / len(k_data)
            cate_prob[k][uk] = prob_uk
            
    return cate_prob


def _gaussian(data, mean, std):
    return 1/(np.sqrt(2*np.pi)*std) * np.exp(- (data - mean)**2/2/std**2)


def _get_con(fea, label_counter):
    con_mean_std = {}
    
    for l in label_counter.keys():
        k_data = fea[label == l]
        
        mean = k_data.mean()
        std = k_data.std()
        con_mean_std[l] ={}
        
        con_mean_std[l]['mean'] = mean
        con_mean_std[l]['std']= std
    
    return con_mean_std
        
cate_prob = _get_cate(data[0, :], counter)
con_mean_st = _get_con(data[1, :], counter)
    
print("this is not for training, but pre-compute:", _get_cate(data[0, :], counter))
print("continues type:", _get_con(data[1, :], counter))



this is not for training, but pre-compute: {0: {1.0: 0.5, 2.0: 0.5}, 1: {2.0: 0.3333333333333333, 1.0: 0.6666666666666666}}
continues type: {0: {'mean': 5.5, 'std': 4.3}, 1: {'mean': 3.6, 'std': 1.840289832245635}}


In [27]:
# let's make prediction
test_data = [1, 2.0]

first_col_prob = {k:p.get(test_data[0]) for k, p in cate_prob.items()}
print(first_col_prob)

# continues compute
second_col_prob ={}
for k, mean_std in con_mean_st.items():
    mean = mean_std['mean']
    std = mean_std['std']
    
    second_col_prob[k] = _gaussian(test_data[1], mean, std)
    
print(second_col_prob)

# let's combine it.
out_prob = {}
for k in counter.keys():
    out_prob[k] = first_col_prob[k] * second_col_prob[k]
    
print(out_prob)

{0: 0.5, 1: 0.6666666666666666}
{0: 0.0666157863979957, 1: 0.14855286870976026}
{0: 0.03330789319899785, 1: 0.09903524580650683}


In [40]:
# only thing is to get lagest prob
print("get predict:", list(out_prob.keys())[np.argmax(list(out_prob.values()))])

get predict: 1
