** 算法名称： Naive Bayes Classifiers ** 
    - GuassianNB
```
    sklearn.naive_bayes.GaussianNB(priors=None, 
                                   var_smoothing=1e-09)
```
    - BernoulliNB
```
    sklearn.naive_bayes.BernoulliNB(alpha=1.0, 
                                    binarize=0.0, 
                                    fit_prior=True, 
                                    class_prior=None)
```

    - MultinomialNB
```
    sklearn.naive_bayes.MultinomialNB(alpha=1.0, 
                                      fit_prior=True, 
                                      class_prior=None)
```

**1. 类别**
    
    Classifier

**2. 数学原理及证明**

    根据贝叶斯概率原理建立模型进行预测。贝叶斯模型对每一个特征都单独计算概率，和线性回归相比，训练模型所需时间更少，模型更加范型化。Naive Bayes（朴素贝叶斯）是一个简单的多类分类算法，该算法的前提是**<假设各特征之间是相互独立的>** 。Naive Bayes 训练主要是为每一个特征，在给定的标签的条件下，计算每个特征在该标签的条件下的条件概率。最后用这个训练后的条件概率去预测。
![Picture3.png](attachment:Picture3.png)

   - GuassianNB
![image.png](attachment:GuassianNB_image.png)


   - BernoulliNB
![image.png](attachment:BernoulliNB_image.png)


   - MultinomialNB
![image.png](attachment:MultinomialNB_image.png)

**3. 算法关键参数及特点**
   - GuassianNB
      > applied to **Continuous Data **
      
      > store **Average value** and **the Standard Deviation** of each feature for each class
      
      > mostly used on ** very high-dimentsional data** 
      
        
   - BernoulliNB 

      > applied to **Binary Data **
      
      > store **Frequence on NOT zero** of each feature for each class
      
      > mostly used on ** sparse count data** 
      
      > Parameter: **alpha**  add virtual data ponts into all features.
      
          >> underfitting < less complex model < alpha higher 
          
          >> overfitting  < less complex model < alpha lower 

    

   - MultinomialNB
        
      > applied to **Count Data **
      
      > store **Average values** of each feature for each class
      
      > Parameter: **alpha**  add virtual data ponts into all features.
      
          >> underfitting < less complex model < alpha higher 
          
          >> overfitting  < less complex model < alpha lower 
          

**4. 模型选择**

  - GuassianNB高斯朴素贝叶斯多用于处理连续数据和高维度数据。
  - BernoulliNB伯努利朴素贝叶斯主要用于离散特征分类，和MultinomialNB的区别是：MultinomialNB以出现的次数为特征值，BernoulliNB为二进制或布尔型特性 
  - MultinomialNB多项式朴素贝叶斯多用于处理稀疏数据例如文本分类单词统计，以出现的次数作为特征值。 
  - 贝叶斯和线性回归相比有相同的优点和弱点。但是贝叶斯在模型训练和预测上更快，准确性上稍弱。  
  
  
**5. Refer To**
  - https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
  - https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html
  - https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html
  - http://blog.csdn.net/chlele0105/article/details/37695115
  - http://www.itnose.net/detail/6431230.html
  - http://www.cnblogs.com/jerrylead/archive/2011/03/05/1971903.html
  - http://blog.csdn.net/v_july_v/article/details/40984699
  - http://mindhacks.cn/2008/09/21/the-magical-bayesian-method/
  - http://blog.csdn.net/v_july_v/article/details/7577684
  - https://www.cnblogs.com/pinard/p/6074222.html
  

##  BernoulliNB example
---

```
sklearn.naive_bayes.BernoulliNB(alpha=1.0, 
                                    binarize=0.0, 
                                    fit_prior=True,
                                    class_prior=None)
伯努利朴素贝叶斯类似于多项式朴素贝叶斯，也主要用户离散特征分类，和MultinomialNB的区别是：MultinomialNB以出现的次数为特征值BernoulliNB为二进制或布尔型特性
参数说明：
    binarize：将数据特征二值化的阈值
```

In [1]:
import sys
print("Python version: {}".format(sys.version))

import pandas as pd
print("pandas version: {}".format(pd.__version__))

import matplotlib
print("matplotlib version: {}".format(matplotlib.__version__))

import numpy as np
print("numpy version: {}".format(np.__version__))

import scipy as sp 
print("scipy version: {}".format(sp.__version__))

import IPython 
print("IPython version: {}".format(IPython.__version__))

import sklearn
print("sklearn version: {}".format(sklearn.__version__))

import mglearn
import matplotlib.pyplot as plt

#binary training data set 
X = np.array([[0,1,0,1],
              [1,0,1,1],
              [0,0,0,1],
              [1,0,1,0]])

#target data set: there are two classes. class-0 and class-1 
#class-0: line 0 and line 2 in X
#class-1: line 1 and line 3 in X
y= np.array([0,1,0,1])

#count the Frequence on NOT zero for each feature of each class
#for class-0: 0,1,0,2
#for class-1: 2,0,2,1

counts = {}
for label in np.unique(y):
    # iterate over each class 
    # count (sum) entries of 1 per feature 
    counts[label] = X[y == label].sum(axis=0)
print("Feature counts:\n{}".format(counts))

Python version: 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
pandas version: 0.23.4
matplotlib version: 3.0.2
numpy version: 1.15.4
scipy version: 1.1.0
IPython version: 5.3.0
sklearn version: 0.18.1
Feature counts:
{0: array([0, 1, 0, 2]), 1: array([2, 0, 2, 1])}


In [2]:
import numpy as np
from sklearn.naive_bayes import BernoulliNB
X = np.array([[1,2,3,4],
              [1,3,4,4],
              [2,4,5,5]])
y = np.array([1,1,2])
clf = BernoulliNB(alpha=2.0,binarize = 3.0,fit_prior=True)
clf.fit(X,y)

BernoulliNB(alpha=2.0, binarize=3.0, class_prior=None, fit_prior=True)

In [3]:
# 经过binarize = 3.0二值化处理，相当于输入的X数组为
X = np.array([[0,0,0,1],[0,0,1,1],[0,1,1,1]])
print(X)

[[0 0 0 1]
 [0 0 1 1]
 [0 1 1 1]]


In [4]:
# class_log_prior_：类先验概率对数值，类先验概率等于各类的个数/类的总个数
clf.class_log_prior_

array([-0.40546511, -1.09861229])

In [5]:
# feature_log_prob_ :指定类的各特征概率(条件概率)对数值，返回形状为(n_classes, n_features)数组
clf.feature_log_prob_  

array([[-1.09861229, -1.09861229, -0.69314718, -0.40546511],
       [-0.91629073, -0.51082562, -0.51082562, -0.51082562]])

In [6]:
# 假设X对应的四个特征为A1、A2、A3、A4，类别为y1,y2,类别为y1时，特征A1的概率为：P(A1|y=y1) = P(A1=0|y=y1)*A1+P(A1=1|y=y1)*A1
print([np.log((2+2)/(2+2*2))*0+np.log((0+2)/(2+2*2))*1,
       np.log((2+2)/(2+2*2))*0+np.log((0+2)/(2+2*2))*1,
       np.log((1+2)/(2+2*2))*0+np.log((1+2)/(2+2*2))*1,
       np.log((0+2)/(2+2*2))*0+np.log((2+2)/(2+2*2))*1])

[-1.0986122886681098, -1.0986122886681098, -0.6931471805599453, -0.40546510810816444]


In [7]:
# class_count_：按类别顺序输出其对应的个数
clf.class_count_

array([2., 1.])

In [8]:
# feature_count_：各类别各特征值之和，按类的顺序输出，返回形状为[n_classes, n_features] 的数组
clf.feature_count_

array([[0., 0., 1., 2.],
       [0., 1., 1., 1.]])

## GaussianNB example
---

```
sklearn.naive_bayes.GaussianNB(priors=None, 
                               var_smoothing=1e-09)
```

In [9]:
import numpy as np
from sklearn.naive_bayes import GaussianNB
X = np.array([[-1, -1], 
              [-2, -2], 
              [-3, -3],
              [-4,-4],
              [-5,-5], 
              [1, 1], 
              [2,2], 
              [3, 3]])
y = np.array([1, 1, 1,1,1, 2, 2, 2])
print(X.shape)
clf = GaussianNB() #默认priors=None
clf.fit(X,y)
GaussianNB(priors=None)

(8, 2)


GaussianNB(priors=None)

In [10]:
# 无返回值，因priors=None
clf.priors

In [11]:
# 设置priors参数值
clf.set_params(priors=[0.625, 0.375])

GaussianNB(priors=[0.625, 0.375])

In [12]:
# 返回各类标记对应先验概率组成的列表
clf.priors

[0.625, 0.375]

In [13]:
# class_prior_属性：同priors一样，都是获取各个类标记对应的先验概率，区别在于priors属性返回列表，class_prior_返回的是数组
clf.class_prior_

type(clf.class_prior_)

numpy.ndarray

In [14]:
# class_count_属性：获取各类标记对应的训练样本数
clf.class_count_

array([5., 3.])

In [15]:
#theta_属性：获取各个类标记在各个特征上的均值
clf.theta_

array([[-3., -3.],
       [ 2.,  2.]])

In [16]:
# sigma_属性：获取各个类标记在各个特征上的方差
clf.sigma_

array([[2.00000001, 2.00000001],
       [0.66666667, 0.66666667]])

In [17]:
clf.fit(X,y,np.array([0.05,0.05,0.1,0.1,0.1,0.2,0.2,0.2]))#设置样本不同的权重

print("theta_:\n{}".format(clf.theta_))
print("sigma_:\n{}".format(clf.sigma_))

theta_:
[[-3.375 -3.375]
 [ 2.     2.   ]]
sigma_:
[[1.73437501 1.73437501]
 [0.66666667 0.66666667]]


In [18]:
# predict(X)：直接输出测试集预测的类标记
clf.predict([[-6,-6],[4,5]])

array([1, 2])

In [19]:
# predict_proba(X)：输出测试样本在各个类标记预测概率值
clf.predict_proba([[-6,-6],[4,5]])

array([[1.00000000e+00, 1.68482943e-40],
       [2.81463802e-12, 1.00000000e+00]])

In [20]:
# predict_log_proba(X)：输出测试样本在各个类标记上预测概率值对应对数值
clf.predict_log_proba([[-6,-6],[4,5]])

array([[ 0.00000000e+00, -9.15817394e+01],
       [-2.65961875e+01, -2.81374923e-12]])

In [21]:
# score(X, y, sample_weight=None)：返回测试样本映射到指定类标记上的得分(准确率)
clf.score([[-6,-6],[-4,-2],[-3,-4],[4,5]],[1,1,2,2])

0.75

## MultinomialNB Example
---

```
sklearn.naive_bayes.MultinomialNB(alpha=1.0, 
                                  fit_prior=True,
                                  class_prior=None)
多项式朴素贝叶斯主要用于离散特征分类，例如文本分类单词统计，以出现的次数作为特征值

参数说明：
    - alpha：浮点型，可选项，默认1.0，添加拉普拉修/Lidstone平滑参数
    
    - fit_prior：布尔型，可选项，默认True，表示是否学习先验概率，参数为False表示所有类标记具有相同的先验概率
    
    - class_prior：类似数组，数组大小为(n_classes,)，默认None，类先验概率
```

In [22]:
import numpy as np
from sklearn.naive_bayes import MultinomialNB
X = np.array([[1,2,3,4],
              [1,3,4,4],
              [2,4,5,5],
              [2,5,6,5],
              [3,4,5,6],
              [3,5,6,6]])
y = np.array([1,1,4,2,3,3])
clf = MultinomialNB(alpha=2.0)
clf.fit(X,y)

print(clf.class_log_prior_)

[-1.09861229 -1.79175947 -1.09861229 -1.79175947]


In [23]:
# 若指定了class_prior参数，不管fit_prior为True或False，class_log_prior_取值是class_prior转换成log后的结果
import numpy as np
from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB(alpha=2.0,fit_prior=True,class_prior=[0.3,0.1,0.3,0.2])
clf.fit(X,y)
print(clf.class_log_prior_)
print(np.log(0.3),np.log(0.1),np.log(0.3),np.log(0.2))
clf1 = MultinomialNB(alpha=2.0,fit_prior=False,class_prior=[0.3,0.1,0.3,0.2])
clf1.fit(X,y)
print(clf1.class_log_prior_)

[-1.2039728  -2.30258509 -1.2039728  -1.60943791]
-1.2039728043259361 -2.3025850929940455 -1.2039728043259361 -1.6094379124341003
[-1.2039728  -2.30258509 -1.2039728  -1.60943791]


In [24]:
# 若fit_prior参数为False，class_prior=None，则各类标记的先验概率相同等于类标记总个数N分之一
clf = MultinomialNB(alpha=2.0,fit_prior=False)
clf.fit(X,y)
print(clf.class_log_prior_)
print(np.log(1/4))

[-1.38629436 -1.38629436 -1.38629436 -1.38629436]
-1.3862943611198906


In [25]:
# 若fit_prior参数为True，class_prior=None，则各类标记的先验概率相同等于各类标记个数除以各类标记个数之和
clf = MultinomialNB(alpha=2.0,fit_prior=True)
clf.fit(X,y)
print(clf.class_log_prior_)#按类标记1、2、3、4的顺序输出
print(np.log(2/6),np.log(1/6),np.log(2/6),np.log(1/6))

[-1.09861229 -1.79175947 -1.09861229 -1.79175947]
-1.0986122886681098 -1.791759469228055 -1.0986122886681098 -1.791759469228055


In [26]:
# intercept_：将多项式朴素贝叶斯解释的class_log_prior_映射为线性模型，其值和class_log_propr相同
clf.class_log_prior_

array([-1.09861229, -1.79175947, -1.09861229, -1.79175947])

In [27]:
clf.intercept_

array([-1.09861229, -1.79175947, -1.09861229, -1.79175947])

In [28]:
# feature_log_prob_：指定类的各特征概率(条件概率)对数值，返回形状为(n_classes, n_features)数组
clf.feature_log_prob_

array([[-2.01490302, -1.45528723, -1.2039728 , -1.09861229],
       [-1.87180218, -1.31218639, -1.178655  , -1.31218639],
       [-1.74919985, -1.43074612, -1.26369204, -1.18958407],
       [-1.79175947, -1.38629436, -1.23214368, -1.23214368]])

In [29]:
# 特征条件概率计算过程，以类为1各个特征对应的条件概率为例
# 特征的条件概率=（指定类下指定特征出现的次数+alpha）/（指定类下所有特征出现次数之和+类的可能取值个数*alpha）
print(np.log((1+1+2)/(1+2+3+4+1+3+4+4+4*2)),
      np.log((2+3+2)/(1+2+3+4+1+3+4+4+4*2)),
      np.log((3+4+2)/(1+2+3+4+1+3+4+4+4*2)),
      np.log((4+4+2)/(1+2+3+4+1+3+4+4+4*2)))

-2.0149030205422647 -1.455287232606842 -1.2039728043259361 -1.0986122886681098


In [30]:
# coef_：将多项式朴素贝叶斯解释feature_log_prob_映射成线性模型，其值和feature_log_prob相同
clf.coef_

array([[-2.01490302, -1.45528723, -1.2039728 , -1.09861229],
       [-1.87180218, -1.31218639, -1.178655  , -1.31218639],
       [-1.74919985, -1.43074612, -1.26369204, -1.18958407],
       [-1.79175947, -1.38629436, -1.23214368, -1.23214368]])

In [31]:
# class_count_：训练样本中各类别对应的样本数，按类的顺序排序输出
clf.class_count_

array([2., 1., 2., 1.])

In [32]:
# feature_count_：各类别各个特征出现的次数，返回形状为(n_classes, n_features)数组
clf.feature_count_

array([[ 2.,  5.,  7.,  8.],
       [ 2.,  5.,  6.,  5.],
       [ 6.,  9., 11., 12.],
       [ 2.,  4.,  5.,  5.]])

In [33]:
print([(1+1),(2+3),(3+4),(4+4)])#以类别1为例

[2, 5, 7, 8]


In [34]:
# get_params(deep=True)：获取分类器的参数，以各参数字典形式返回
clf.get_params()

{'alpha': 2.0, 'class_prior': None, 'fit_prior': True}

In [35]:
# predict(X)：在测试集X上预测，输出X对应目标值
clf.predict([[1,3,5,6],[3,4,5,4]])

array([1, 3])

In [36]:
# predict_log_proba(X)：测试样本划分到各个类的概率对数值
clf.predict_log_proba([[3,4,5,4],[1,3,5,6]])

array([[-1.27396027, -1.69310891, -1.04116963, -1.69668527],
       [-0.78041614, -2.05601551, -1.28551649, -1.98548389]])

In [37]:
# predict_proba(X)：输出测试样本划分到各个类别的概率值
clf.predict_proba([[3,4,5,4],[1,3,5,6]])

array([[0.27972165, 0.18394676, 0.35304151, 0.18329008],
       [0.45821529, 0.12796282, 0.27650773, 0.13731415]])

In [38]:
# score(X, y, sample_weight=None)：输出对测试样本的预测准确率的平均值
clf.score([[3,4,5,4],[1,3,5,6]],[1,1])

0.5