<a href="https://colab.research.google.com/github/poovarasansivakumar2003/Marvel_Batch_4_works/blob/main/Task_2_Naive_Bayesian_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive Bayesian Classifier:

##Introduction
The Naive Bayesian Classifier is a probabilistic machine learning model used for classification tasks. It is based on **Bayes' Theorem** and assumes that the features are conditionally independent given the class. Despite this "naive" assumption, the classifier often performs surprisingly well in practice, especially for text classification tasks such as spam detection and sentiment analysis.

##Key Concepts
**Bayes' Theorem**: The foundation of the Naive Bayesian Classifier is Bayes' Theorem, which calculates the posterior probability of a class given a set of features:
where:

**Conditional Independence**: The classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature.

![Bayestheorem](https://miro.medium.com/v2/resize:fit:804/1*6dmvRYysiU5PwWIcHRdKVw.png)

where ùëã = (ùë•1, ùë•2, ‚Ä¶, ùë•ùëõ) is a vector of features.

##Types of Naive Bayes Classifiers:

**Gaussian Naive Bayes**: Assumes that features follow a Gaussian distribution.
Multinomial Naive Bayes: Used for discrete counts (e.g., text classification with word counts).

**Bernoulli Naive Bayes**: Used for binary/Boolean features (e.g., text classification with binary term occurrence).

##Implementation Using Scikit-Learn
The Naive Bayesian Classifier can be easily implemented using the scikit-learn library in Python. Below is an example of using the Naive Bayes classifier for text classification.

We'll use the MultinomialNB class from scikit-learn to classify text data from the 20 Newsgroups dataset.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
newsgroups_train = fetch_20newsgroups(subset='train', shuffle=True)
newsgroups_test = fetch_20newsgroups(subset='test', shuffle=True)

# Vectorize the text data
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(newsgroups_train.data)
X_test = vectorizer.transform(newsgroups_test.data)
y_train = newsgroups_train.target
y_test = newsgroups_test.target

# Train the Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=newsgroups_test.target_names))


Accuracy: 0.7728
Classification Report:
                          precision    recall  f1-score   support

             alt.atheism       0.79      0.77      0.78       319
           comp.graphics       0.67      0.74      0.70       389
 comp.os.ms-windows.misc       0.20      0.00      0.01       394
comp.sys.ibm.pc.hardware       0.56      0.77      0.65       392
   comp.sys.mac.hardware       0.84      0.75      0.79       385
          comp.windows.x       0.65      0.84      0.73       395
            misc.forsale       0.93      0.65      0.77       390
               rec.autos       0.87      0.91      0.89       396
         rec.motorcycles       0.96      0.92      0.94       398
      rec.sport.baseball       0.96      0.87      0.91       397
        rec.sport.hockey       0.93      0.96      0.95       399
               sci.crypt       0.67      0.95      0.78       396
         sci.electronics       0.79      0.66      0.72       393
                 sci.med       0.87

##Explanation
**Data Loading**: The 20 Newsgroups dataset is loaded using the fetch_20newsgroups function, which includes training and test subsets.

**Vectorization**: The text data is converted into numerical feature vectors using CountVectorizer. This step converts the collection of text documents to a matrix of token counts.

**Model Training**: The MultinomialNB classifier is trained using the training data.

**Prediction**: The trained model is used to predict the labels for the test data.

**Evaluation**: The performance of the classifier is evaluated using accuracy and classification report metrics.