# Movie Review Sentiment Analysis

Sentiment analysis is a powerful tool for understanding the emotional tone behind a series of words. In this project, we focus on classifying movie reviews as either positive or negative. We utilize the Naive Bayes algorithm, which is particularly effective for text classification tasks. The steps involved in this project include data loading and preprocessing, feature extraction, model training, model evaluation, and making predictions.

<img src='picture.jpg' width="500" height="500">

# Data Loading and Preprocessing

The first step involves loading the movie reviews dataset from the NLTK library and preprocessing it for analysis. Each review is labeled as either positive or negative.

In [1]:
import pandas as pd
from nltk.corpus import movie_reviews
import nltk
nltk.download('movie_reviews')

# Load movie reviews and their categories
reviews = [(list(movie_reviews.words(fileid)), category)
           for category in movie_reviews.categories()
           for fileid in movie_reviews.fileids(category)]

# Create a DataFrame
df = pd.DataFrame(reviews, columns=['review', 'sentiment'])

# Preprocess reviews: join words into a single string and map sentiment to binary values
df['review'] = df['review'].apply(lambda x: ' '.join(x))
df['sentiment'] = df['sentiment'].map({'pos': 1, 'neg': 0})


[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\penta\AppData\Roaming\nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


# Feature Extraction
We use the CountVectorizer to transform the text data into numerical features suitable for machine learning. The CountVectorizer converts the collection of text documents to a matrix of token counts, considering only the most frequent terms and ignoring common English stop words.

In [2]:
from sklearn.feature_extraction.text import CountVectorizer

# Convert text data to numerical data
vectorizer = CountVectorizer(stop_words='english', max_df=0.7)
X = vectorizer.fit_transform(df['review'])
y = df['sentiment']


# Model Training
We split the dataset into training and testing sets and train a Naive Bayes classifier on the training data.#

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train, y_train)


# Model Evaluation
After training the model, we evaluate its performance using accuracy, precision, recall, and F1 score. We also print a detailed classification report.



In [4]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

# Predict sentiments for the test set
y_pred = model.predict(X_test)

# Calculate and print evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')
print(classification_report(y_test, y_pred, target_names=['neg', 'pos']))


Accuracy: 0.80
Precision: 0.83
Recall: 0.75
F1 Score: 0.79
              precision    recall  f1-score   support

         neg       0.77      0.85      0.81       199
         pos       0.83      0.75      0.79       201

    accuracy                           0.80       400
   macro avg       0.80      0.80      0.80       400
weighted avg       0.80      0.80      0.80       400



# Prediction Function
We define a function to predict the sentiment of a given movie review. The function takes a review as input, preprocesses it, and uses the trained model to predict its sentiment

In [5]:
# Function to predict sentiment of a review
def predict_sentiment(review):
    review_vectorized = vectorizer.transform([review])
    prediction = model.predict(review_vectorized)
    return 'positive' if prediction[0] == 1 else 'negative'


# Example Usage
We demonstrate the usage of the prediction function with example movie reviews.

In [6]:
# Example usage of the prediction function
review1 = "The movie was fantastic! The performances were Oscar-worthy."
print(f"The review '{review1}' is predicted to be: {predict_sentiment(review1)}")

review2 = "The movie was terrible and a waste of time."
print(f"The review '{review2}' is predicted to be: {predict_sentiment(review2)}")


The review 'The movie was fantastic! The performances were Oscar-worthy.' is predicted to be: positive
The review 'The movie was terrible and a waste of time.' is predicted to be: negative
