Naive Bayes is a probability-based machine learning algorithm used mainly for classification.
It is based on Bayes’ Theorem with a naive assumption that all features are independent of each other.

It predicts a class by calculating probabilities and choosing the class with the highest posterior probability.

# Naive bayes workflow

Calculate prior probabilities

Calculate likelihoods for each feature

Multiply probabilities

Choose class with highest probability

## Why Is It Called “Naive”?

Because it assumes all features are independent, which is rarely true in real life.

Example:

Words in a sentence are not independent

Age and income are not independent

Yet… it works surprisingly well.

## Types of Naive Bayes

| Type               | Used When         |
| ------------------ | ----------------- |
| **Gaussian NB**    | Continuous data   |
| **Multinomial NB** | Text, word counts |
| **Bernoulli NB**   | Binary features   |


## Simple Example (Spam Detection)

| Word    | Spam | Not Spam |
| ------- | ---- | -------- |
| “Free”  | High | Low      |
| “Offer” | High | Low      |


Given message: “Free Offer” 

P(Spam∣Free,Offer)>P(NotSpam∣Free,Offer) 

Classified as Spam

## Advantages

 Fast & scalable
 
 Works well on text data
 
 Needs less training data
 
 Easy to implement

## Disadvantages

Independence assumption

Zero probability issue

Less accurate than complex models

Zero Probability Problem & Laplace Smoothing

P=(count+1)/(total+n)
	
Used to avoid probability = 0

## Real-World Use Cases


Email spam filtering

Sentiment analysis

Document classification

Medical diagnosis

Recommendation systems

## Naive Bayes vs Other Models

| Model               | Speed | Accuracy |
| ------------------- | ----- | -------- |
| Naive Bayes         | ⭐⭐⭐⭐⭐ | ⭐⭐⭐      |
| Logistic Regression | ⭐⭐⭐   | ⭐⭐⭐⭐     |
| Random Forest       | ⭐⭐    | ⭐⭐⭐⭐⭐    |


## Test Classifier

In [1]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

X = ["Free money now", "Hi how are you"]
y = ["Spam", "Not Spam"]

vec = CountVectorizer()
X_vec = vec.fit_transform(X)

model = MultinomialNB()
model.fit(X_vec, y)

model.predict(vec.transform(["Free offer"]))


array(['Spam'], dtype='<U8')

In [2]:
model.predict(vec.transform(["Free sancks"]))

array(['Spam'], dtype='<U8')

In [3]:
model.predict(vec.transform(["Get fast money"]))

array(['Spam'], dtype='<U8')

In [4]:
model.predict(vec.transform(["Hi"]))

array(['Not Spam'], dtype='<U8')

In [5]:
model.predict(vec.transform(["Hi my name is shubham and i am data science trainer,i want  to know do you have any vacancy for data science trainer,if yes let me know"]))

array(['Not Spam'], dtype='<U8')

## Example 2

In [6]:
import pandas as pd

df = pd.read_excel("Titanic.xlsx")
df.head()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [7]:
df['Age'].fillna(df['Age'].mean(), inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].mean(), inplace=True)


In [8]:
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})


In [9]:
X = df[['Pclass', 'Sex', 'Age', 'Fare']]
y = df['Survived']


In [10]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [11]:
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)


0,1,2
,priors,
,var_smoothing,1e-09


In [12]:
y_pred = model.predict(X_test)
y_pred[:10]


array([0, 0, 0, 1, 1, 1, 1, 0, 1, 1])

In [13]:
y_pred = pd.Series(y_pred, name="Predicted", index=y_test.index)
result = pd.concat([X_test, y_test, y_pred], axis=1)


In [14]:
result

Unnamed: 0,Pclass,Sex,Age,Fare,Survived,Predicted
709,3,0,29.699118,15.2458,1,0
439,2,0,31.000000,10.5000,0,0
840,3,0,20.000000,7.9250,0,0
720,2,1,6.000000,33.0000,1,1
39,3,1,14.000000,11.2417,1,1
...,...,...,...,...,...,...
433,3,0,17.000000,7.1250,0,0
773,3,0,29.699118,7.2250,0,0
25,3,1,38.000000,31.3875,1,1
84,2,1,17.000000,10.5000,1,1


In [15]:
result['Survived'].sum()

np.int64(74)

In [16]:
result['Predicted'].sum()

np.int64(75)

In [None]:
179 total,actual ser =74,pred ser =75,

In [18]:
result.query('Survived==1 and Predicted==1').count()

Pclass       53
Sex          53
Age          53
Fare         53
Survived     53
Predicted    53
dtype: int64

In [16]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.7597765363128491
[[83 22]
 [21 53]]
              precision    recall  f1-score   support

           0       0.80      0.79      0.79       105
           1       0.71      0.72      0.71        74

    accuracy                           0.76       179
   macro avg       0.75      0.75      0.75       179
weighted avg       0.76      0.76      0.76       179



| Actual \ Predicted | Died (0)       | Survived (1)   |
| ------------------ | -------------- | -------------- |
| Died (0)           | True Negative  | False Positive |
| Survived (1)       | False Negative | True Positive  |


Why Gaussian Naive Bayes?

✔ Continuous variables
✔ Fast training
✔ Works well on small datasets

## Side-by-Side Comparison Table

| Feature           | Gaussian NB | Multinomial NB | Bernoulli NB |
| ----------------- | ----------- | -------------- | ------------ |
| Data type         | Continuous  | Count          | Binary       |
| Distribution      | Normal      | Multinomial    | Bernoulli    |
| Absence matters   | ❌          | ❌              | ✅            |
| Frequency matters | ❌           | ✅              | ❌            |
| Used for text     | ❌           | ✅              | ⚠️           |
| Used for numbers  | ✅           | ❌              | ❌            |
| Speed             | Fast        | Very Fast      | Very Fast    |


| Scenario            | Why NB Fails            | Better Alternative |
| ------------------- | ----------------------- | ------------------ |
| Correlated features | Double counting         | Logistic Reg       |
| Skewed numeric data | Wrong Gaussian          | Random Forest      |
| Rare events         | Zero probs              | Boosting           |
| Imbalanced data     | Prior dominance         | XGBoost            |
| Feature interaction | No interaction modeling | Neural Net         |
| Mixed data          | Poor modeling           | CatBoost           |


## BernauliNB

In [22]:
import pandas as pd
import numpy as np

In [23]:
from sklearn.naive_bayes import BernoulliNB

In [24]:
df = pd.read_excel(r"C:\Users\hp\Desktop\ML\BernauliNB.xlsx")

In [25]:
df

Unnamed: 0,Email,Free,Win,Money,Spam
0,E1,1,1,1,1
1,E2,1,0,1,1
2,E3,0,1,0,1
3,E4,0,0,0,0
4,E5,1,0,0,0


In [26]:
X=df.drop(columns=['Spam'])

In [27]:
y=df['Spam']

In [28]:
X.drop(columns=['Email'],inplace=True)

In [29]:
X

Unnamed: 0,Free,Win,Money
0,1,1,1
1,1,0,1
2,0,1,0
3,0,0,0
4,1,0,0


In [30]:
y

0    1
1    1
2    1
3    0
4    0
Name: Spam, dtype: int64

In [31]:
model = BernoulliNB(alpha=1.0)  # Laplace smoothing
model.fit(X, y)


0,1,2
,alpha,1.0
,force_alpha,True
,binarize,0.0
,fit_prior,True
,class_prior,


### Key Parameters of BernoulliNB

| Parameter   | Meaning                            |
| ----------- | ---------------------------------- |
| `alpha`     | Laplace smoothing (default = 1.0)  |
| `binarize`  | Threshold to convert values to 0/1 |
| `fit_prior` | Learn class prior probabilities    |


In [32]:
# New email: "Free Money"
X_new = pd.DataFrame(
    [[1, 0, 1]],
    columns=["Free", "Win", "Money"]
)

prediction = model.predict(X_new)
probability = model.predict_proba(X_new)

print("Predicted class:", prediction)
print("Class probabilities:", probability)


Predicted class: [1]
Class probabilities: [[0.30266344 0.69733656]]


In [33]:
# New email: "Free Money Win prize"
X_new = pd.DataFrame(
    [[1, 1, 1]],
    columns=["Free", "Win", "Money"]
)

prediction = model.predict(X_new)
probability = model.predict_proba(X_new)

print("Predicted class:", prediction)
print("Class probabilities:", probability)


Predicted class: [1]
Class probabilities: [[0.08796622 0.91203378]]


In [34]:
# New email: "Hi shubham"
X_new = pd.DataFrame(
    [[0, 0, 0]],
    columns=["Free", "Win", "Money"]
)

prediction = model.predict(X_new)
probability = model.predict_proba(X_new)

print("Predicted class:", prediction)
print("Class probabilities:", probability)


Predicted class: [0]
Class probabilities: [[0.74552684 0.25447316]]
