# 第一题：使用sklearn的GaussianNB、BernoulliNB、MultinomialNB完成spambase垃圾邮件分类任务

实验内容：
1. 使用GaussianNB、BernoulliNB、MultinomialNB完成spambase邮件分类
2. 计算各自十折交叉验证的精度、查准率、查全率、F1值
3. 根据精度、查准率、查全率、F1值的实际意义以及四个值的对比阐述三个算法在spambase中的表现对比

# 1. 读取数据集

In [3]:
import numpy as np
spambase = np.loadtxt('spambase/spambase.data', delimiter = ",")
spamx = spambase[:, :57]
spamy = spambase[:, 57]

# 2. 导入模型

In [4]:
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import cross_val_predict

# 3. 计算十折交叉验证下，GaussianNB、BernoulliNB、MultinomialNB的精度、查准率、查全率、F1值

In [7]:
# YOUR CODE HERE
# 引入检验标准
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

## GaussianNB

In [5]:
model1 = GaussianNB()
prediction1 = cross_val_predict(model1, spamx, spamy, cv=10)

In [10]:
acc1 = accuracy_score(spamy, prediction1)
pre1 = precision_score(spamy, prediction1)
rec1 = recall_score(spamy, prediction1)
f11 = f1_score(spamy, prediction1)

In [11]:
print("GaussianNB:\naccuracy_score:",acc1,"\nprecision_score:",pre1,"\nrecall_score:",rec1,"\nf1_score:",f11)

GaussianNB:
accuracy_score: 0.8217778743751358 
precision_score: 0.7004440855874041 
recall_score: 0.9569773855488142 
f1_score: 0.8088578088578089


## BernoulliNB

In [12]:
model2 = BernoulliNB()
prediction2 = cross_val_predict(model2, spamx, spamy, cv=10)

In [13]:
acc2 = accuracy_score(spamy, prediction2)
pre2 = precision_score(spamy, prediction2)
rec2 = recall_score(spamy, prediction2)
f12 = f1_score(spamy, prediction2)

In [14]:
print("BernoulliNB:\naccuracy_score:",acc2,"\nprecision_score:",pre2,"\nrecall_score:",rec2,"\nf1_score:",f12)

BernoulliNB:
accuracy_score: 0.8839382742881983 
precision_score: 0.8813357185450209 
recall_score: 0.815223386651958 
f1_score: 0.8469914040114614


## MultinomialNB

In [15]:
model3 = MultinomialNB()
prediction3 = cross_val_predict(model3, spamx, spamy, cv=10)

In [16]:
acc3 = accuracy_score(spamy, prediction3)
pre3 = precision_score(spamy, prediction3)
rec3 = recall_score(spamy, prediction3)
f13 = f1_score(spamy, prediction3)

In [17]:
print("MultinomialNB:\naccuracy_score:",acc3,"\nprecision_score:",pre3,"\nrecall_score:",rec3,"\nf1_score:",f13)

MultinomialNB:
accuracy_score: 0.786350793305803 
precision_score: 0.7323628219484882 
recall_score: 0.7214561500275786 
f1_score: 0.7268685746040567


###### 双击此处填写
算法|精度|查准率|查全率|F1值
-|-|-|-|-
GaussianNB|0.8217778743751358 |0.7004440855874041|0.9569773855488142|0.8088578088578089
BernoulliNB|0.8839382742881983 |0.8813357185450209 |0.815223386651958|0.8469914040114614
MultinomialNB|0.786350793305803|0.7323628219484882|0.7214561500275786|0.7268685746040567

分析如下：
- 在精度、查准率、F1值3个评价准则上，**BernouliNB**均给出了最好的实验效果，综合效果最佳；
- **GaussianNB**模型在查全率上给出了最佳95.7%的结果，做到了尽可能检测出所有的垃圾邮件（不在乎结果是否正确的情况下）。然而其检测的查准率最低；
- **MultinomialNB**模型在精度、查全率、F1值3个检测标准上均给出了最低值，综合效果最差。