#### 朴素贝叶斯分类器

**连续属性贝叶斯分类器**
+ 学习连续属性朴素贝叶斯分类器
+ 显示类别先验概率
+ 显示每个类别、每个属性高斯分布的参数
+ 预测训练数据
+ 显示后验概率，显示预测结果

In [1]:
import numpy as np
from sklearn.naive_bayes import GaussianNB

X_cont = np.array([ [0.697,0.460], [0.774,0.376], [0.634,0.264], [0.608,0.318],
                    [0.556,0.215], [0.403,0.237], [0.481,0.149], [0.437,0.211],
                    [0.666,0.091], [0.243,0.267], [0.245,0.057], [0.343,0.099], 
                    [0.639,0.161], [0.657,0.198], [0.360,0.370], [0.593,0.042], 
                    [0.719,0.103] ] )

y = ['是', '是', '是', '是', '是', '是', '是', '是', 
     '否', '否', '否', '否', '否', '否', '否', '否', '否']

nb_cont = GaussianNB().fit(X_cont,y)
print("Parameters:\n\t Prior of classes:", nb_cont.class_prior_)
print("\t",chr(956),":",nb_cont.theta_[0],nb_cont.theta_[1])
print("\t",chr(963),":",nb_cont.var_[0],nb_cont.var_[1])

print("\nPredict probabilities:")
print(nb_cont.predict_proba(X_cont))
print("\nPredict labels:")
print(nb_cont.predict(X_cont))
print("Score of train set:", nb_cont.score(X_cont,y))

Parameters:
	 Prior of classes: [0.52941176 0.47058824]
	 μ : [0.49611111 0.15422222] [0.57375 0.27875]
	 σ : [0.03370254 0.01032862] [0.01460844 0.00891244]

Predict probabilities:
[[0.04164757 0.95835243]
 [0.11946261 0.88053739]
 [0.24918736 0.75081264]
 [0.15038577 0.84961423]
 [0.40922077 0.59077923]
 [0.56498589 0.43501411]
 [0.70271521 0.29728479]
 [0.57828441 0.42171559]
 [0.78129434 0.21870566]
 [0.85959214 0.14040786]
 [0.99090406 0.00909594]
 [0.94079438 0.05920562]
 [0.56082358 0.43917642]
 [0.43838558 0.56161442]
 [0.29487969 0.70512031]
 [0.88435478 0.11564522]
 [0.77153256 0.22846744]]

Predict labels:
['是' '是' '是' '是' '是' '否' '否' '否' '否' '否' '否' '否' '否' '是' '是' '否' '否']
Score of train set: 0.7058823529411765


**离散属性贝叶斯分类器**
+ 离散字符属性转换为离散数字属性
+ 显示转换前、后的属性array
+ 学习离散属性朴素贝叶斯分类器
+ 预测训练数据
+ 显示后验概率，显示预测结果

In [2]:
from sklearn import preprocessing
from sklearn.naive_bayes import CategoricalNB

X_discret = np.array([
    ['青绿', '蜷缩', '浊响', '清晰', '凹陷', '硬滑'],
    ['乌黑', '蜷缩', '沉闷', '清晰', '凹陷', '硬滑'],
    ['乌黑', '蜷缩', '浊响', '清晰', '凹陷', '硬滑'],
    ['青绿', '蜷缩', '沉闷', '清晰', '凹陷', '硬滑'],
    ['浅白', '蜷缩', '浊响', '清晰', '凹陷', '硬滑'],
    ['青绿', '稍蜷', '浊响', '清晰', '稍凹', '软粘'],
    ['乌黑', '稍蜷', '浊响', '稍糊', '稍凹', '软粘'],
    ['乌黑', '稍蜷', '浊响', '清晰', '稍凹', '硬滑'],
    ['乌黑', '稍蜷', '沉闷', '稍糊', '稍凹', '硬滑'],
    ['青绿', '硬挺', '清脆', '清晰', '平坦', '软粘'],
    ['浅白', '硬挺', '清脆', '模糊', '平坦', '硬滑'],
    ['浅白', '蜷缩', '浊响', '模糊', '平坦', '软粘'],
    ['青绿', '稍蜷', '浊响', '稍糊', '凹陷', '硬滑'],
    ['浅白', '稍蜷', '沉闷', '稍糊', '凹陷', '硬滑'],
    ['乌黑', '稍蜷', '浊响', '清晰', '稍凹', '软粘'],
    ['浅白', '蜷缩', '浊响', '模糊', '平坦', '硬滑'],
    ['青绿', '蜷缩', '沉闷', '稍糊', '稍凹', '硬滑']
])

print("Original features:\n", X_discret)
le = preprocessing.LabelEncoder()
for col in range(6):
    f = le.fit_transform(X_discret[:,col])
    X_discret[:,col] = f

print("\nTransformed features:\n", X_discret)

X_discret = X_discret.astype(np.uint8)
print("\nTransformed digital features:\n", X_discret)

nb_discret = CategoricalNB().fit(X_discret,y)
print("\nPredict probabilities:\n", nb_discret.predict_proba(X_discret))
print("\nPredict labels:", nb_discret.predict(X_discret))
print("Score of train set:", nb_discret.score(X_discret,y))

Original features:
 [['青绿' '蜷缩' '浊响' '清晰' '凹陷' '硬滑']
 ['乌黑' '蜷缩' '沉闷' '清晰' '凹陷' '硬滑']
 ['乌黑' '蜷缩' '浊响' '清晰' '凹陷' '硬滑']
 ['青绿' '蜷缩' '沉闷' '清晰' '凹陷' '硬滑']
 ['浅白' '蜷缩' '浊响' '清晰' '凹陷' '硬滑']
 ['青绿' '稍蜷' '浊响' '清晰' '稍凹' '软粘']
 ['乌黑' '稍蜷' '浊响' '稍糊' '稍凹' '软粘']
 ['乌黑' '稍蜷' '浊响' '清晰' '稍凹' '硬滑']
 ['乌黑' '稍蜷' '沉闷' '稍糊' '稍凹' '硬滑']
 ['青绿' '硬挺' '清脆' '清晰' '平坦' '软粘']
 ['浅白' '硬挺' '清脆' '模糊' '平坦' '硬滑']
 ['浅白' '蜷缩' '浊响' '模糊' '平坦' '软粘']
 ['青绿' '稍蜷' '浊响' '稍糊' '凹陷' '硬滑']
 ['浅白' '稍蜷' '沉闷' '稍糊' '凹陷' '硬滑']
 ['乌黑' '稍蜷' '浊响' '清晰' '稍凹' '软粘']
 ['浅白' '蜷缩' '浊响' '模糊' '平坦' '硬滑']
 ['青绿' '蜷缩' '沉闷' '稍糊' '稍凹' '硬滑']]

Transformed features:
 [['2' '2' '1' '1' '0' '0']
 ['0' '2' '0' '1' '0' '0']
 ['0' '2' '1' '1' '0' '0']
 ['2' '2' '0' '1' '0' '0']
 ['1' '2' '1' '1' '0' '0']
 ['2' '1' '1' '1' '2' '1']
 ['0' '1' '1' '2' '2' '1']
 ['0' '1' '1' '1' '2' '0']
 ['0' '1' '0' '2' '2' '0']
 ['2' '0' '2' '1' '1' '1']
 ['1' '0' '2' '0' '1' '0']
 ['1' '2' '1' '0' '1' '1']
 ['2' '1' '1' '2' '0' '0']
 ['1' '1' '0' '2' '0' '0']
 ['0' '1' '1' '1

**混合属性贝叶斯分类器**
+ 使用连续属性和离散属性朴素贝叶斯分类器
+ 计算混合属性的类别后验概率
+ 计算混合属性预测结果
+ 注意：两个朴素贝叶斯分类器均乘了类别的先验概率

In [7]:
proba = nb_cont.predict_proba(X_cont) * nb_discret.predict_proba(X_discret) / nb_cont.class_prior_
print("Predict probabilies:\n", proba)

predict_label = proba.argmax(axis=1)
print("\nPredict labels:", predict_label)

Predict probabilies:
 [[4.38992896e-03 1.92285515e+00]
 [1.40093923e-02 1.75497366e+00]
 [1.61194031e-02 1.54083732e+00]
 [2.82247196e-02 1.62604067e+00]
 [9.95071565e-02 1.09379357e+00]
 [2.43429373e-01 7.13546422e-01]
 [7.19041208e-01 2.89514707e-01]
 [1.28205821e-01 7.90964353e-01]
 [9.19896863e-01 1.75057191e-01]
 [1.52151503e+00 1.87727632e-02]
 [1.86544510e+00 6.46727379e-05]
 [1.69633175e+00 5.71511598e-03]
 [4.50090307e-01 5.36730044e-01]
 [6.41879611e-01 2.68331892e-01]
 [8.38846808e-02 1.27272120e+00]
 [1.57078167e+00 1.46622887e-02]
 [8.67555558e-01 1.96478649e-01]]

Predict labels: [1 1 1 1 1 1 0 1 0 0 0 0 1 0 1 0 0]
