<a href="https://colab.research.google.com/github/mayank-garg12/NLP-assignment-/blob/main/NLP_assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `Assignment 2 — SENTIMENT ANALYSIS USING ML & NLP`
Submitted by: *Mayank Garg*

Roll No.: *2301730272*

## Install Required Libraries

In [None]:
!pip install pandas numpy scikit-learn nltk



## Import Libraries

In [None]:
import pandas as pd
import numpy as np
import nltk
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## Create Dataset

*three different sources are simulated:*

*1. Social Media Style*

*2. Customer reviews*

*3. Website feedback / general text*

In [None]:
data = {
    "text": [
        # Source 1: Social media style
        "The product is amazing and works perfectly!",
        "Terrible experience. Completely disappointed.",
        "Had a great time using this app.",
        "Worst update ever, nothing works now.",

        # Source 2: Customer reviews
        "The service was fast and the staff was friendly.",
        "Delivery was late and the package was damaged.",
        "Excellent quality and worth the price.",
        "Not useful at all, total waste of money.",

        # Source 3: Website feedback / general text
        "The new features are very helpful.",
        "The interface is confusing and slow."
    ],

    "sentiment": [
        "positive", "negative", "positive", "negative",
        "positive", "negative", "positive", "negative",
        "positive", "negative"
    ]
}

df = pd.DataFrame(data)
df

Unnamed: 0,text,sentiment
0,The product is amazing and works perfectly!,positive
1,Terrible experience. Completely disappointed.,negative
2,Had a great time using this app.,positive
3,"Worst update ever, nothing works now.",negative
4,The service was fast and the staff was friendly.,positive
5,Delivery was late and the package was damaged.,negative
6,Excellent quality and worth the price.,positive
7,"Not useful at all, total waste of money.",negative
8,The new features are very helpful.,positive
9,The interface is confusing and slow.,negative


## Basic Text Preprocessing
*The dataset is cleaned by: converting text to lowercase, removing unnecessary spaces
This step ensures that the model treats words uniformly and improves accuracy.*

In [None]:
# Converting all text to lowercase and removing extra spaces
df['clean_text'] = df['text'].str.lower().str.strip()

df

Unnamed: 0,text,sentiment,clean_text
0,The product is amazing and works perfectly!,positive,the product is amazing and works perfectly!
1,Terrible experience. Completely disappointed.,negative,terrible experience. completely disappointed.
2,Had a great time using this app.,positive,had a great time using this app.
3,"Worst update ever, nothing works now.",negative,"worst update ever, nothing works now."
4,The service was fast and the staff was friendly.,positive,the service was fast and the staff was friendly.
5,Delivery was late and the package was damaged.,negative,delivery was late and the package was damaged.
6,Excellent quality and worth the price.,positive,excellent quality and worth the price.
7,"Not useful at all, total waste of money.",negative,"not useful at all, total waste of money."
8,The new features are very helpful.,positive,the new features are very helpful.
9,The interface is confusing and slow.,negative,the interface is confusing and slow.


## Train-Test Split
*The dataset is divided into:*
- ***Training data** (80%)*
- ***Testing data** (20%)*

*Training data is used to teach the model, while test data checks how well the model performs on unseen sentences.*

In [None]:
# Separating input text and labels
X = df['clean_text']
y = df['sentiment']

# Splitting data into training and testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)

## Text Vectorization (TF-IDF)
*The TF-IDF Vectorizer converts text into numerical values.
This helps machine learning models understand the importance of each word.
TF-IDF gives higher weightage to important words and reduces the weight of common words.*

In [None]:
# Converting text into numerical values using TF-IDF
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

## Build the Model
*Logistic Regression is used as the machine learning model.
It is a simple and effective algorithm for binary classification like positive vs. negative sentiment.*

In [None]:
# Training a simple Logistic Regression model
model = LogisticRegression()
model.fit(X_train_vec, y_train)

## Model Evaluation
*The trained model is tested using the test dataset.
We calculate: Accuracy, Precision, Recall & F1-score.
These metrics help us understand how well the model predicts positive and negative sentiments.*

In [None]:
# Predicting on test data
y_pred = model.predict(X_test_vec)

print("Model Accuracy:", accuracy_score(y_test, y_pred))
print("\nDetailed Classification Report:\n")
print(classification_report(y_test, y_pred))

Model Accuracy: 1.0

Detailed Classification Report:

              precision    recall  f1-score   support

    negative       1.00      1.00      1.00         1
    positive       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2



## Test Custom Predictions
*In this final step, the model predicts the sentiment of new sample sentences.
This shows how the model could behave in real-life situations outside the dataset.*

In [None]:
sample_text = [
    "The update is really good and smooth.",
    "This is the worst thing ever made.",
    "Had a great time using this app.",
    "This Product works very good."
]

sample_vec = vectorizer.transform(sample_text)
predictions = model.predict(sample_vec)

for text, pred in zip(sample_text, predictions):
    print(f"Text: {text} --> {pred}")

Text: The update is really good and smooth. --> negative
Text: This is the worst thing ever made. --> negative
Text: Had a great time using this app. --> positive
Text: This Product works very good. --> positive


*Conclusion*

*This analysis helped clean, process, and understand the text data using basic NLP techniques. The results provide clear insights and prepare the dataset for any further modeling or advanced analysis.*