<h1>Naive Bayes Classifier</h1>

* Naive Bayes is a problablistic classifier based on Bayes' Theorem
* Works well with high-dimensional data

<h2>Types of Naive Bayes Classifier</h2>

* Gaussian Naive Bayes: Assume that the features follow a normal distribution. (Here we calculate the probability density)
* Multinomial Naive Bayes: Used to discrete data like word counts in text classifier
* Bernoulli Naive Bayes: Used for binary/ Boolean features

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, accuracy_score

In [2]:
df = pd.read_csv("ytspam.csv",encoding="latin1")
df.head(10)

Unnamed: 0,Name,Comment,Time,Likes,Reply Count,Spam
0,Taofeekat,&lt;????i make my first million investing in f...,2022-09-28T02:08:55Z,30,30,1
1,Angelina Jordan,&lt;?l will forever be indebted to you I will ...,2022-09-23T05:26:48Z,0,0,1
2,Fernandez Joe,<b>????I recommend a professional forex/Bitcoi...,2022-09-20T12:56:30Z,5,2,1
3,Jessica Billy,I think Im blessed because if not I wouldnt ...,2022-09-17T20:20:24Z,21,34,1
4,Allison Zar,<b>I recommend a professional broker to you g...,2022-09-05T09:19:30Z,19,27,1
5,Williams Adam,MRS EVELYN IS LEGIT AND HER METHOD WORKS LIKE ...,2022-08-31T19:30:53Z,32,30,1
6,duke claire,?i recommend a professional broker to you guys...,2022-08-28T19:26:12Z,2,37,1
7,Visit Platform ??LYNNHACK??telegram,<b>You improved????my trading alot that my $80...,2022-08-17T22:38:09Z,0,0,1
8,Visit Platform ??LYNNHACK??telegram,<b>You improved????my trading alot that my $80...,2022-08-17T22:38:05Z,0,0,1
9,Jennifer Kyle,<b>I recommend a professional broker to you gu...,2022-08-16T23:13:10Z,3,30,1


In [3]:
# Feature extraction
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['Comment'])
y = df['Spam']

In [4]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [5]:
# Train model
model = MultinomialNB()
model.fit(X_train, y_train)

In [6]:
# Predict and evaluate
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%')
print(classification_report(y_test, y_pred, target_names=['Not Spam', 'Spam']))

Accuracy: 88.13%
              precision    recall  f1-score   support

    Not Spam       0.90      0.85      0.88       732
        Spam       0.87      0.91      0.89       768

    accuracy                           0.88      1500
   macro avg       0.88      0.88      0.88      1500
weighted avg       0.88      0.88      0.88      1500



In [8]:
new_comments = [
    "Congratulations! You've won a free gift card. Click here to claim it now!",
    "She is so pretty my gawd",
    "Make money fast with this one simple trick! Check it out now."
]

In [9]:
# Transform the new comments
X_new = vectorizer.transform(new_comments)

In [10]:
# Predict the class (spam or not spam)
predictions = model.predict(X_new)

# Predict the probability of each class
predictions_prob = model.predict_proba(X_new)

In [11]:
print(list(zip(predictions,predictions_prob)))

[(1, array([0.15272569, 0.84727431])), (0, array([0.83421346, 0.16578654])), (1, array([0.24654544, 0.75345456]))]


In [12]:
df.sample(10)

Unnamed: 0,Name,Comment,Time,Likes,Reply Count,Spam
832,Jhay,Your videos are amazing and very helpful. I am...,2022-04-12T15:08:59Z,337,0,1
947,"SHEWENHACKS AT GMAIL,COM",KAUHACKS ON TELEGRAM IS REAL AND LEGITIMATE WI...,2022-03-12T15:10:42Z,0,0,1
3794,MrZZooh,She is not a good explainer. She just stares.,2021-05-22T21:33:43Z,1,0,0
2914,Akshay Singh,Graduate explain better than her,2022-01-24T10:58:19Z,0,0,0
3553,Deepak Singh,I&#39;m sure the teenage boy is not listening ...,2021-08-13T08:59:46Z,1,0,0
134,Rita Hodd,"It is estimated that over 40,000 to 300,000 pe...",2021-09-09T23:17:32Z,8,13,1
2917,Michael Lee,That first little girl was adorable,2022-01-23T22:16:36Z,0,0,0
2731,Parabolic,How easy is it to acces and how easy will it b...,2022-03-29T09:29:10Z,0,0,0
2721,Linda Materi,"<b>When it comes to the world of investing,mos...",2022-04-03T12:14:18Z,0,0,0
2955,zahed mehal,wait don&#39;t just look directly at the camer...,2022-01-09T04:27:56Z,0,0,0


In [13]:
df.loc[3151]

Name                           G?do Rod
Comment        She is so pretty mi gawd
Time               2021-11-30T04:03:25Z
Likes                                 0
Reply Count                           0
Spam                                  0
Name: 3151, dtype: object

In [14]:
# Mapping the predictions to labels
labels = ['Not Spam', 'Spam']

# Print the results
for i, comment in enumerate(new_comments):
    print(f"Comment: {comment}")
    print(f"Predicted: {labels[predictions[i]]}")
    print(f"Probability: {predictions_prob[i]}")
    print("\n")

Comment: Congratulations! You've won a free gift card. Click here to claim it now!
Predicted: Spam
Probability: [0.15272569 0.84727431]


Comment: She is so pretty my gawd
Predicted: Not Spam
Probability: [0.83421346 0.16578654]


Comment: Make money fast with this one simple trick! Check it out now.
Predicted: Spam
Probability: [0.24654544 0.75345456]


