<a href="https://colab.research.google.com/github/kamalahmadov474/Deep-Learning/blob/main/Sentiment_Analysis_using_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step 1: Import libraries


In [None]:
import pandas as pd
import numpy as np
import re
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

Step 2: Load the dataset


In [None]:
df = pd.read_csv('/content/IMDB-Dataset.csv')

Step 3: Explore the dataset


In [None]:
print(df.head())
print(df['sentiment'].value_counts())

                                              review sentiment
0  One of the other reviewers has mentioned that ...  positive
1  A wonderful little production. <br /><br />The...  positive
2  I thought this was a wonderful way to spend ti...  positive
3  Basically there's a family where a little boy ...  negative
4  Petter Mattei's "Love in the Time of Money" is...  positive
sentiment
positive    25000
negative    25000
Name: count, dtype: int64


Step 4: Preprocess text


In [None]:
def clean_text(text):
    text = text.lower()
    text = re.sub(r"<.*?>", "", text)         # Remove HTML tags
    text = re.sub(r"[^a-zA-Z']", " ", text)   # Keep only letters
    text = re.sub(r"\s+", " ", text).strip()
    return text

df['review'] = df['review'].apply(clean_text)

Step 5: Encode labels

In [None]:
le = LabelEncoder()
df['sentiment'] = le.fit_transform(df['sentiment'])  # positive=1, negative=0

Step 6: Tokenize and pad sequences

In [None]:
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(df['review'])
sequences = tokenizer.texts_to_sequences(df['review'])
X = pad_sequences(sequences, maxlen=200)
y = df['sentiment'].values

Step 7: Train-test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 8: Build LSTM model

In [None]:
model = Sequential()
model.add(Embedding(input_dim=10000, output_dim=128, input_length=200))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])


Step 9: Train model

In [None]:
history = model.fit(X_train, y_train, epochs=3, batch_size=64, validation_split=0.2)


Epoch 1/3
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m325s[0m 643ms/step - accuracy: 0.7150 - loss: 0.5410 - val_accuracy: 0.8460 - val_loss: 0.3657
Epoch 2/3
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m319s[0m 637ms/step - accuracy: 0.8602 - loss: 0.3412 - val_accuracy: 0.8619 - val_loss: 0.3466
Epoch 3/3
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m315s[0m 623ms/step - accuracy: 0.8880 - loss: 0.2846 - val_accuracy: 0.8661 - val_loss: 0.3341


Step 10: Evaluate model

In [None]:
for i in range(1, 10):
    review_text = df['review'][i]
    true_sentiment = df['sentiment'][i]

    # Preprocess single review
    sequence = tokenizer.texts_to_sequences([clean_text(review_text)])
    padded = pad_sequences(sequence, maxlen=200)
    prediction = model.predict(padded)[0][0]

    # Convert prediction to label
    predicted_label = "positive" if prediction > 0.5 else "negative"
    true_label = "positive" if true_sentiment == 1 else "negative"
    correct = predicted_label == true_label

    # Print full output
    print(f"\n--- Review #{i+1} ---")
    print(f"Text:\n{review_text}")
    print(f"True Sentiment: {true_label}")
    print(f"Predicted Sentiment: {predicted_label} (Confidence: {prediction:.2f})")
    print(f"Correct: {correct}")



[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step

--- Review #2 ---
Text:
a wonderful little production the filming technique is very unassuming very old time bbc fashion and gives a comforting and sometimes discomforting sense of realism to the entire piece the actors are extremely well chosen michael sheen not only has got all the polari but he has all the voices down pat too you can truly see the seamless editing guided by the references to williams' diary entries not only is it well worth the watching but it is a terrificly written and performed piece a masterful production about one of the great master's of comedy and his life the realism really comes home with the little things the fantasy of the guard which rather than use the traditional 'dream' techniques remains solid then disappears it plays on our knowledge and our senses particularly with the scenes concerning orton and halliwell and the sets particularly of their flat with halliwell's murals decorat