# Sentiment Analysis using LSTM Model

This Jupyter notebook contains code for a sentiment analysis model using a Long Short-Term Memory (LSTM) network. The model is trained on the IMDB movie reviews dataset.

## Table of Contents

1. [Imports and Setup](#imports-and-setup)
2. [Data Preparation](#data-preparation)
3. [Model Architecture](#model-architecture)
4. [Training the Model](#training-the-model)

## Imports and Setup

The code begins by importing the necessary libraries and modules. It also sets up the environment and defines some constants.

```python
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] ="3"

from tensorflow import keras
from keras import layers

batch_size = 32
```

## Data Preparation

The training and testing datasets are loaded from the "aclImdb/train" and "aclImdb/test" directories, respectively. The datasets are batched and shuffled. The `text_vectorization` layer is used to convert the text data to sequences of integers.

```python
train_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/train", batch_size=batch_size
)

test_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/test", batch_size=batch_size
)

text_only_train_ds = train_ds.map(lambda x, y: x)

max_length = 600
max_tokens = 20000
text_vectorization = layers.TextVectorization(
    max_tokens=max_tokens,
    output_mode="int",
    output_sequence_length=max_length,
)

text_vectorization.adapt(text_only_train_ds)

int_train_ds = train_ds.map(
    lambda x, y: (text_vectorization(x), y),
    num_parallel_calls=8)

int_test_ds = test_ds.map(
    lambda x, y: (text_vectorization(x), y),
    num_parallel_calls=8)
```

## Model Architecture

The `model` is defined as a sequential model with an embedding layer, an LSTM layer, and a dense layer with sigmoid activation for binary classification.

```python
inputs = keras.Input(shape=(None,), dtype="int64")
embedded = layers.Embedding(input_dim=max_tokens, output_dim=256)(inputs)
x = layers.LSTM(32)(embedded)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
```

## Training the Model

The model is compiled with the RMSprop optimizer and binary cross-entropy loss. The model is then trained on the training set for 10 epochs.

```python
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])

model.summary()

model.fit(int_train_ds, validation_data=int_test_ds, epochs=10)
```

In [1]:
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] ="3"

from tensorflow import keras
from keras import layers

batch_size = 32

train_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/train", batch_size=batch_size
)

test_ds = keras.utils.text_dataset_from_directory(
    "aclImdb/test", batch_size=batch_size
)

for inputs, targets in train_ds:
    print("inputs.shape:", inputs.shape)
    print("inputs.dtype:", inputs.dtype)
    print("targets.shape:", targets.shape)
    print("targets.dtype:", targets.dtype)
    print("inputs[0]:", inputs[0])
    print("targets[0]:", targets[0])
    break

text_only_train_ds = train_ds.map(lambda x, y: x)

max_length = 600
max_tokens = 20000
text_vectorization = layers.TextVectorization(
    max_tokens=max_tokens,
    output_mode="int",
    output_sequence_length=max_length,
)

text_vectorization.adapt(text_only_train_ds)

int_train_ds = train_ds.map(
    lambda x, y: (text_vectorization(x), y),
    num_parallel_calls=8)

int_test_ds = test_ds.map(
    lambda x, y: (text_vectorization(x), y),
    num_parallel_calls=8)

inputs = keras.Input(shape=(None,), dtype="int64")
embedded = layers.Embedding(input_dim=max_tokens, output_dim=256)(inputs)
x = layers.LSTM(32)(embedded)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)

model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
              
model.summary()

model.fit(int_train_ds, validation_data=int_test_ds, epochs=10)

Found 25000 files belonging to 2 classes.
Found 25000 files belonging to 2 classes.
inputs.shape: (32,)
inputs.dtype: <dtype: 'string'>
targets.shape: (32,)
targets.dtype: <dtype: 'int32'>
inputs[0]: tf.Tensor(b'I first saw Martin\'s Day when I was just 10 years old, at home, on The Movie Channel, and still remember the impact it made on my life. It touched me as no other film had touched me, and I remember balling my eyes out.<br /><br />After the first time I saw it, I couldn\'t find it anywhere else. I would ask around and no one had ever heard of the film! I guess it was one of those more rare films that not many people knew about, because no one, and I mean no one, knew what I was talking about. I searched and searched throughout the years, checking video stores shelves and scanning cable TV listings, but always came up short. Finally, in 1996 I found out I could special order it, I did, and have probably watched it at least 50 times since--and it still makes me cry, every time.<b

<keras.callbacks.History at 0x1e607786340>