# Sentiment Analysis on Text Data

This notebook performs sentiment analysis using classical machine learning models and compares their performance.

## 1. Import Libraries

In [1]:

import pandas as pd
import numpy as np
import re

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


## 2. Load Dataset

In [2]:

df = pd.read_csv("data/sample_reviews.csv")
df.head()


Unnamed: 0,review,sentiment
0,The product quality is excellent and delivery ...,positive
1,Very poor experience and bad customer service,negative
2,Absolutely loved the movie,positive
3,Not worth the price,negative
4,Great performance and smooth usage,positive


## 3. Text Preprocessing

In [3]:

def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    return text

df['clean_review'] = df['review'].apply(clean_text)
df.head()


Unnamed: 0,review,sentiment,clean_review
0,The product quality is excellent and delivery ...,positive,the product quality is excellent and delivery ...
1,Very poor experience and bad customer service,negative,very poor experience and bad customer service
2,Absolutely loved the movie,positive,absolutely loved the movie
3,Not worth the price,negative,not worth the price
4,Great performance and smooth usage,positive,great performance and smooth usage


## 4. Train-Test Split

In [4]:

X = df['clean_review']
y = df['sentiment']

vectorizer = TfidfVectorizer(stop_words='english')
X_vectorized = vectorizer.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_vectorized, y, test_size=0.2, random_state=42
)


## 5. Model 1: Logistic Regression

In [5]:

lr_model = LogisticRegression(max_iter=1000)
lr_model.fit(X_train, y_train)

lr_predictions = lr_model.predict(X_test)
lr_accuracy = accuracy_score(y_test, lr_predictions)

print("Logistic Regression Accuracy:", lr_accuracy)


Logistic Regression Accuracy: 0.5


## 6. Model 2: Multinomial Naive Bayes

In [6]:

nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

nb_predictions = nb_model.predict(X_test)
nb_accuracy = accuracy_score(y_test, nb_predictions)

print("Naive Bayes Accuracy:", nb_accuracy)


Naive Bayes Accuracy: 0.5


## 7. Model Evaluation

In [7]:

print("Logistic Regression Confusion Matrix")
print(confusion_matrix(y_test, lr_predictions))

print("\nNaive Bayes Confusion Matrix")
print(confusion_matrix(y_test, nb_predictions))

print("\nClassification Report (Logistic Regression)")
print(classification_report(y_test, lr_predictions))


Logistic Regression Confusion Matrix
[[1 0]
 [1 0]]

Naive Bayes Confusion Matrix
[[1 0]
 [1 0]]

Classification Report (Logistic Regression)
              precision    recall  f1-score   support

    negative       0.50      1.00      0.67         1
    positive       0.00      0.00      0.00         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2



  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


## 8. Model Comparison

In [8]:

results = pd.DataFrame({
    'Model': ['Logistic Regression', 'Naive Bayes'],
    'Accuracy': [lr_accuracy, nb_accuracy]
})
results


Unnamed: 0,Model,Accuracy
0,Logistic Regression,0.5
1,Naive Bayes,0.5


## 9. Conclusion

This project demonstrates an end-to-end ML workflow and comparison of classical models.