# IMDB Movie Review Sentiment Analysis with SimpleRNN

This Jupyter notebook demonstrates a simple implementation of a Recurrent Neural Network (RNN) for sentiment analysis on the IMDB movie review dataset. The model uses the SimpleRNN layer from Keras to process sequential data.

## Table of Contents
1. [Imports](#imports)
2. [Data Loading and Preprocessing](#data-loading-and-preprocessing)
3. [Model Building](#model-building)
4. [Model Training and Evaluation](#model-training-and-evaluation)

## Imports

The necessary libraries and modules are imported at the beginning of the script.

```python
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import keras
from keras.utils import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import SimpleRNN
from keras.datasets import imdb
```

## Data Loading and Preprocessing

The IMDB dataset is loaded using `imdb.load_data()`. The dataset is split into training and testing sets. The sequences are then padded to a maximum length of 80 words using `pad_sequences()`.

```python
max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=20000)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
```

## Model Building

A simple sequential model is built using the `Sequential()` class from Keras. The model consists of an Embedding layer, a SimpleRNN layer, and a Dense layer with a sigmoid activation function.

```python
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(SimpleRNN(128))
model.add(Dense(1, activation='sigmoid'))
```

The model is then compiled with the binary cross-entropy loss function, the Adam optimizer, and accuracy as the metric.

```python
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
```

## Model Training and Evaluation

The model is trained on the training data for 15 epochs using a batch size of 32. The testing data is used for validation during training.

```python
print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=15,
          validation_data=(x_test, y_test))
```

After training, the model is evaluated on the testing data, and the test loss and accuracy are printed.

```python
loss, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', loss)
print('Test accuracy:', acc)
```

This notebook provides a basic example of using a SimpleRNN for sentiment analysis on the IMDB movie review dataset.

In [2]:
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import keras
from keras.utils import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding, SimpleRNN
from keras.datasets import imdb

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=20000)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test =  pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(SimpleRNN(128))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=15,
          validation_data=(x_test, y_test))
loss, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', loss)
print('Test accuracy:', acc)


Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)
Build model...
Train...
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Test score: 0.7163980603218079
Test accuracy: 0.6698799729347229
