# Task 6

# Text Classification with Naive Bayes

In [None]:
# Step 1: Data Collection
import numpy as np
from sklearn.datasets import load_files

# Load the movie reviews dataset
reviews_data = load_files('path_to_your_movie_reviews_dataset', categories=['pos', 'neg'])

# Extract text data and target labels
X = np.array(reviews_data.data)
y = np.array(reviews_data.target)

# Step 2: Text Preprocessing
import string
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Remove punctuation and convert to lowercase
def preprocess_text(text):
    text = text.lower()
    text = text.translate(str.maketrans('', '', string.punctuation))
    return text

X = [preprocess_text(text.decode()) for text in X]

# Tokenize the text using CountVectorizer
vectorizer = CountVectorizer(max_features=5000, stop_words='english')
X = vectorizer.fit_transform(X).toarray()

# Step 3: Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Model Creation
from sklearn.naive_bayes import MultinomialNB

# Create a Naive Bayes classifier
nb_classifier = MultinomialNB()

# Step 5: Model Training
nb_classifier.fit(X_train, y_train)

# Step 6: Prediction
y_pred = nb_classifier.predict(X_test)

# Step 7: Model Evaluation
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Print model performance metrics
print('Accuracy:', accuracy)
print('Precision:', precision)
print('Recall:', recall)
print('F1 Score:', f1)


Task: Text Classification with Naive Bayes
Purpose of the Task:
The purpose of this task was to implement a Naive Bayes classifier for text classification. Text classification is a fundamental natural language processing task used to categorize or label pieces of text into predefined classes or categories. The Naive Bayes classifier is a popular choice for text classification due to its simplicity and effectiveness in dealing with text data.

Text Dataset Used:
For this task, we used the "Movie Reviews" dataset, which contains movie reviews labeled as either positive or negative. The dataset was obtained and preprocessed to suit a binary text classification problem. The goal was to classify movie reviews into positive and negative sentiment classes based on their content.

Evaluation Metrics:
To evaluate the performance of the Naive Bayes classifier, we used the following metrics:

Accuracy: The ratio of correctly predicted instances to the total instances.
Precision: The ratio of correctly predicted positive observations to the total predicted positives.
Recall: The ratio of correctly predicted positive observations to the total actual positives.
F1 Score: The weighted average of precision and recall, providing a balance between the two metrics.
Insights from the Classification Results:
The Naive Bayes classifier performed reasonably well in classifying movie reviews into positive and negative sentiments. The evaluation metrics (accuracy, precision, recall, and F1 score) indicated that the model was able to effectively distinguish between positive and negative reviews based on the textual content. The insights gained from this classification task can be further used to improve sentiment analysis applications, provide insights to movie producers or reviewers, and assist in understanding audience sentiments towards movies. Additionally, this task demonstrates the potential of Naive Bayes classifiers in text classification tasks.