**Sentiment Analysis:**
Sentiment analysis, also known as opinion mining, is the process of determining the sentiment or emotional tone conveyed in a piece of text. It involves analyzing and classifying text data to identify whether the expressed sentiment is positive, negative, or neutral. The goal is to understand the subjective information within the text and extract insights about the author's opinion or attitude.

**Experiment Design for Sentiment Analysis with Keras:**

**Objective:** Build a basic sentiment analysis model using Keras to classify movie reviews as positive or negative.

**Dataset:** Use the IMDB Movie Reviews dataset, which is available in the Keras datasets module. This dataset contains 50,000 movie reviews labeled as positive (1) or negative (0).





In [23]:
# Load and Process the dataset

from tensorflow import keras

from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense

# Load IMDB dataset
from keras.datasets import imdb

# Load only the top 10,000 most frequently occurring words
max_words = 10000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_words)

print(X_train[0])

# Preprocess the data (sequence padding)
from keras.preprocessing import sequence
maxlen = 100  # Limit the review length to 100 words
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)


[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]


In [None]:
# Build the model
model = Sequential()
model.add(Embedding(input_dim=max_words, output_dim=32))#, input_length=maxlen))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [19]:
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=64, validation_split=0.15)


Epoch 1/50
[1m333/333[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 40ms/step - accuracy: 0.9346 - loss: 0.1818 - val_accuracy: 0.8421 - val_loss: 0.3813
Epoch 2/50
[1m333/333[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 41ms/step - accuracy: 0.9548 - loss: 0.1343 - val_accuracy: 0.8229 - val_loss: 0.4095
Epoch 3/50
[1m333/333[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 41ms/step - accuracy: 0.9524 - loss: 0.1309 - val_accuracy: 0.8344 - val_loss: 0.4843
Epoch 4/50
[1m333/333[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 41ms/step - accuracy: 0.9697 - loss: 0.0912 - val_accuracy: 0.8285 - val_loss: 0.5500
Epoch 5/50
[1m333/333[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 42ms/step - accuracy: 0.9787 - loss: 0.0645 - val_accuracy: 0.8243 - val_loss: 0.6760
Epoch 6/50
[1m333/333[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 42ms/step - accuracy: 0.9839 - loss: 0.0480 - val_accuracy: 0.8283 - val_loss: 0.6762
Epoch 7/50
[1m3

<keras.src.callbacks.history.History at 0x1a97ee6fa50>

In [20]:
# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")


[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 8ms/step - accuracy: 0.8147 - loss: 2.0118
Test Accuracy: 81.38%


In [26]:
# Convert text to lowercase and remove punctuation
import string
translator = str.maketrans('', '', string.punctuation)

new_reviews = ["I loved this movie!", "The plot was confusing.", "Fuck this movie"]

# Preprocess the input text
new_sequences = []
for review in new_reviews:
    review = review.lower().translate(translator).split()
    sequence = [imdb.get_word_index().get(word, 2) + 3 for word in review]  # Offset by 3
    new_sequences.append(sequence)

print(new_sequences)
from keras.preprocessing import sequence

# Pad sequences
new_sequences = sequence.pad_sequences(new_sequences, maxlen=maxlen)

# Make predictions
predictions = model.predict(new_sequences)

for review, prediction in zip(new_reviews, predictions):
    sentiment = "Positive" if prediction >= 0.5 else "Negative"
    print(f"Review: {review}\nSentiment: {sentiment}\n")

[[13, 447, 14, 20], [4, 114, 16, 1499], [54485, 14, 20]]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
Review: I loved this movie!
Sentiment: Positive

Review: The plot was confusing.
Sentiment: Positive

Review: Fuck this movie
Sentiment: Positive

