<a href="https://colab.research.google.com/github/sunnyfarmana/NLP_Assignments/blob/main/Assignment2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Assignment 2 ‚Äì Sentiment Analysis
**Step 1: Import Libraries**

In [1]:
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

**Step 2: Load a REAL LARGE Dataset (50,000 reviews)**

In [2]:

imdb = tf.keras.datasets.imdb
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=20000)
print("Training Samples:", len(X_train))
print("Testing Samples:", len(X_test))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m2s[0m 0us/step
Training Samples: 25000
Testing Samples: 25000


**Step 3: Convert Numbers ‚Üí Words**

In [3]:
word_index = imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}
def decode_review(encoded_review):
    return " ".join([reverse_word_index.get(i - 3, "?") for i in encoded_review])
X_train_text = [decode_review(review) for review in X_train]
X_test_text = [decode_review(review) for review in X_test]
print("Sample review:\n", X_train_text[0])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[1m1641221/1641221[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m1s[0m 1us/step
Sample review:
 ? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the ? of norman a

**Step 4: TF-IDF Vectorization (Turns text into features)**

In [4]:

vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train_text)
X_test_vec = vectorizer.transform(X_test_text)

**Step 5: Train a Powerful Model**

In [6]:

model = LogisticRegression(max_iter=300)
model.fit(X_train_vec, y_train)

**Step 6: Check Accuracy**

In [7]:
y_pred = model.predict(X_test_vec)
accuracy = accuracy_score(y_test, y_pred)
print("üî• MODEL ACCURACY:", accuracy)

üî• MODEL ACCURACY: 0.87812


**Step 7: Full Classification Report**

In [8]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.88      0.87      0.88     12500
           1       0.88      0.88      0.88     12500

    accuracy                           0.88     25000
   macro avg       0.88      0.88      0.88     25000
weighted avg       0.88      0.88      0.88     25000



**Step 8: Test With Your Own Sentence**

In [9]:
def predict_sentiment(text):
    text_vec = vectorizer.transform([text])
    pred = model.predict(text_vec)[0]
    return "Positive üòÑ" if pred == 1 else "Negative üò°"
print(predict_sentiment("This movie was amazing!"))
print(predict_sentiment("Worst acting I have ever seen."))

Positive üòÑ
Negative üò°
