# 朴素贝叶斯（Naïve Bayes）

朴素贝叶斯用于文本分类

该代码使用多项式朴素贝叶斯（Multinomial Naïve Bayes, MNB）对文本数据进行情感分类，即判断电影评论是正面（1）还是负面（0）。它通过计算不同类别下单词的条件概率，实现对新评论的分类。

In [20]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# 训练数据
X_train = ["I love this movie", "This film was great", "Horrible acting", "Worst movie ever"]
y_train = [1, 1, 0, 0]  # 1: 正面评价, 0: 负面评价

# 文本转换为特征向量
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)

# 训练朴素贝叶斯分类器
nb = MultinomialNB()
nb.fit(X_train_vec, y_train)

# 预测
X_test = ["movie", "Terrible film"]
X_test_vec = vectorizer.transform(X_test)
predictions = nb.predict(X_test_vec)

print(predictions)  # [1, 0] 预测为：正面评价, 负面评价


[0 1]


## 朴素贝叶斯预测是否是垃圾邮件

In [21]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# 训练数据
emails = [
  "Free money now!!!", 
  "Hi Bob, how about a game of golf tomorrow?", 
  "Earn $1000 per week from home", 
  "Hey Alice, are you coming to the party tonight?", 
  "Congratulations, you've won a free ticket to Bahamas!"
]
labels = [1, 0, 1, 0, 1]  # 1: 垃圾邮件, 0: 非垃圾邮件

# 文本转换为特征向量
email_vectorizer = CountVectorizer()
emails_vec = email_vectorizer.fit_transform(emails)

# 训练朴素贝叶斯分类器
spam_nb = MultinomialNB()
spam_nb.fit(emails_vec, labels)

# 预测
new_emails = ["Win a free iPhone now", "Hi John, let's catch up over coffee"]
new_emails_vec = email_vectorizer.transform(new_emails)
spam_predictions = spam_nb.predict(new_emails_vec)

print(spam_predictions)  # [1, 0] 预测为：垃圾邮件, 非垃圾邮件

[1 0]
