# Naive Bayes Text Classification
Naive Bayes is a family of probabilistic classifiers based on Bayes' theorem with an independence assumption between the features. It is a simple and effective algorithm for classification problems.

The Naive Bayes algorithm is based on the following assumptions:

1. Each feature is independent of the others.
2. The features are normally distributed.
3. The class labels are mutually exclusive.

Given a set of features X = {x1, x2, ..., xn} and a class label Y, the Naive Bayes algorithm calculates the probability of each class label given the features using Bayes' theorem:

$P(Y|X) = P(X|Y) * P(Y) / P(X)$

where P(Y|X) is the posterior probability of the class label given the features, P(X|Y) is the likelihood of the features given the class label, P(Y) is the prior probability of the class label, and P(X) is the prior probability of the features.

The Naive Bayes algorithm is typically used for classification problems where the features are continuous and the class labels are categorical. It is a simple and effective algorithm that can be used for a wide range of classification problems.

The advantages of Naive Bayes include:

1. Simple to implement: Naive Bayes is a simple algorithm to implement, especially for small datasets.
2. Fast: Naive Bayes is a fast algorithm, especially for large datasets.
3. Effective: Naive Bayes is an effective algorithm for classification problems, especially when the features are independent.

However, Naive Bayes also has some limitations:

1. Assumes independence: Naive Bayes assumes that the features are independent, which may not always be the case.
2. Assumes normality: Naive Bayes assumes that the features are normally distributed, which may not always be the case.
3. Not suitable for high-dimensional data: Naive Bayes is not suitable for high-dimensional data, as it can be computationally expensive and may not perform well.

Some common applications of Naive Bayes include:

1. Text classification: Naive Bayes is often used for text classification tasks, such as spam vs. non-spam emails.
2. Image classification: Naive Bayes is often used for image classification tasks, such as classifying images as either dogs or cats.
3. Sentiment analysis: Naive Bayes is often used for sentiment analysis tasks, such as determining whether a piece of text is positive, negative, or neutral.

In summary, Naive Bayes is a simple and effective algorithm for classification problems that assumes independence between the features and normality of the features. While it has some limitations, it is a widely used and effective algorithm in many applications.

In [2]:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, confusion_matrix

# Sample dataset: Movie reviews with sentiment labels
reviews = [
    "This movie was excellent! I loved it.",
    "Terrible film. I hated every minute.",
    "Great acting, but the plot was weak.",
    "Amazing special effects and a gripping story!",
    "Boring and predictable. Waste of time.",
    "The characters were well-developed and interesting.",
    "Poor directing and awful screenplay.",
    "Fantastic cinematography and a powerful message.",
    "Disappointing ending. Could have been better.",
    "Highly entertaining from start to finish!"
]

labels = np.array([1, 0, 1, 1, 0, 1, 0, 1, 0, 1])  # 1: Positive, 0: Negative


## Split the dataset into training and testing sets

In [3]:

X_train, X_test, y_train, y_test = train_test_split(reviews, labels, test_size=0.2, random_state=42)


## Create a bag of words representation


In [4]:
vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)


## Train a Naive Bayes classifier


In [5]:
clf = MultinomialNB()
clf.fit(X_train_counts, y_train)


## Make predictions on the test set


In [6]:
y_pred = clf.predict(X_test_counts)

# Print the classification report
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

# Print the confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

# Function to classify new reviews
def classify_review(review):
    review_counts = vectorizer.transform([review])
    prediction = clf.predict(review_counts)
    probability = clf.predict_proba(review_counts)
    sentiment = "Positive" if prediction[0] == 1 else "Negative"
    return f"Sentiment: {sentiment} (Confidence: {max(probability[0]):.2f})"

# Test the classifier with new reviews
new_reviews = [
    "This movie exceeded all my expectations!",
    "I fell asleep halfway through. So dull.",
    "An interesting concept, but poorly executed."
]

for review in new_reviews:
    print(f"\nReview: {review}")
    print(classify_review(review))

              precision    recall  f1-score   support

    Negative       0.00      0.00      0.00       2.0
    Positive       0.00      0.00      0.00       0.0

    accuracy                           0.00       2.0
   macro avg       0.00      0.00      0.00       2.0
weighted avg       0.00      0.00      0.00       2.0

Confusion Matrix:
[[0 2]
 [0 0]]

Review: This movie exceeded all my expectations!
Sentiment: Positive (Confidence: 0.84)

Review: I fell asleep halfway through. So dull.
Sentiment: Positive (Confidence: 0.75)

Review: An interesting concept, but poorly executed.
Sentiment: Positive (Confidence: 0.84)


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
