-----------------------
## Text classification - using RNNs
--------------------------

trains a recurrent neural network on the IMDB large movie review dataset for sentiment analysis.

#### Parameters of Embedding layer

- `Arguments`
    - `input_ dim` : int > 0. Size of the vocabulary, ie. 1 + maximum integer index occurring in the input data.
    
    - `output_dim`: int >= 0. Dimension of the dense embedding.
    - `init`: name of initialization function for the weights of the layer (see: initializations), or alternatively, Theano function to use for weights initialization. `This parameter is only relevant if you don't pass a _weights_ argument`.

    - `weights`: list of Numpy arrays to set as initial weights. The list should have 1 element, of shape (input_dim, output_dim).

    - `W_regularizer`: instance of the regularizers module (eg. L1 or L2 regularization), applied to the embedding matrix.
    - `W_constraint`: instance of the constraints module (eg. maxnorm, nonneg), applied to the embedding matrix.
    
mask_zero: Whether or not the input value 0 is a special "padding" value that should be masked out. This is useful for recurrent layers which may take variable length input. If this is True then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal |vocabulary| + 2).
input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).
dropout: float between 0 and 1. Fraction of the embeddings to drop.


In [21]:
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

#### parameters of SimpleRNN


`Arguments`:

    - `units`: Positive integer, dimensionality of the output space.
    - `activation`: Activation function to use. Default: hyperbolic tangent (`tanh`). 
        - If you pass None, no activation is applied
    (ie. "linear" activation: `a(x) = x`).
    - `use_bias`: Boolean, (default `True`), whether the layer uses a bias vector.
    - `kernel_initializer`: Initializer for the `kernel` weights matrix,
        - used for the linear transformation of the inputs. Default:
    `glorot_uniform`.
    - `recurrent_initializer`: Initializer for the `recurrent_kernel`weights matrix, used for the linear transformation of the recurrent state. Default: `orthogonal`.
    - `bias_initializer`: Initializer for the bias vector. Default: `zeros`.
    - `kernel_regularizer`: Regularizer function applied to the `kernel` weights matrix. Default: `None`.
    
    - `recurrent_regularizer`: Regularizer function applied to the
    `recurrent_kernel` weights matrix. Default: `None`.

    - `bias_regularizer`: Regularizer function applied to the bias vector. Default: `None`.
    - `activity_regularizer`: Regularizer function applied to the output of the layer (its "activation"). Default: `None`.
    - `kernel_constraint`: Constraint function applied to the `kernel` weights matrix. Default: `None`.
    - `recurrent_constraint`: Constraint function applied to the `recurrent_kernel` weights matrix.  Default: `None`.
    - `bias_constraint`: Constraint function applied to the bias vector. Default: `None`.
    - `dropout`: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0.
    - recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0.
    - `return_sequences`: Boolean. Whether to return the last output
    in the output sequence, or the full sequence. Default: `False`.
    - `return_state`: Boolean. Whether to return the last state     in addition to the output. Default: `False`
    - `go_backwards`: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.

`Call arguments`:

    - inputs: A 3D tensor, with shape `[batch, timesteps, feature]`.
    - mask: Binary tensor of shape `[batch, timesteps]` indicating whether a given timestep should be masked.
    - training: Python boolean indicating whether the layer should behave in training mode or in inference mode. This argument is passed to the cell when calling it. This is only relevant if `dropout` or `recurrent_dropout` is used.
    - initial_state: List of initial state tensors to be passed to the first call of the cell.

Examples:

```python
inputs = np.random.random([32, 10, 8]).astype(np.float32)
simple_rnn = tf.keras.layers.SimpleRNN(4)

output = simple_rnn(inputs)  # The output has shape `[32, 4]`.

simple_rnn = tf.keras.layers.SimpleRNN(
    4, return_sequences=True, return_state=True)

- whole_sequence_output has shape `[32, 10, 4]`.
-  final_state has shape `[32, 4]`.
whole_sequence_output, final_state = simple_rnn(inputs)

In [22]:
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 32)          320000    
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, 32)                2080      
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________


 it can return a 3D tensor of shape (batch_size, time_steps, output_features) which is the full sequences of successive outputs for each time steps by adding return_sequences=True.

In [23]:
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32,return_sequences=True))
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, None, 32)          2080      
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________


How did we get the number of parameters?

- g, no. of FFNNs in a unit (RNN has 1, GRU has 3, LSTM has 4)
- h, size of hidden units
- i, dimension/size of input

Since every FFNN has h(h+i) + h parameters, we have
- num_params = g × [h(h+i) + h]

In [24]:
g = 1
h = 32
i = 32

g * (h*(h+i) + h)

2080

In [25]:
model = Sequential()
model.add(Embedding(10000, 32))                #32*10,000
model.add(SimpleRNN(64,return_sequences=True)) #(32+64+1)*64=6208
model.add(SimpleRNN(32,return_sequences=True)) #(64+32+1)*32=3104
model.add(SimpleRNN(32,return_sequences=True)) #(32+32+1)*32=2080
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, None, 64)          6208      
_________________________________________________________________
simple_rnn_3 (SimpleRNN)     (None, None, 32)          3104      
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, None, 32)          2080      
Total params: 331,392
Trainable params: 331,392
Non-trainable params: 0
_________________________________________________________________


#### load IBMD data

In [26]:
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.layers import Dense

In [27]:
max_features = 10000
maxlen = 500
batch_size = 32

In [28]:
print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data( num_words=max_features)

Loading data...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [29]:
print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')
print('Pad sequences (samples x time)')

25000 train sequences
25000 test sequences
Pad sequences (samples x time)


In [30]:
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test  = sequence.pad_sequences(input_test,  maxlen=maxlen)

print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)

input_train shape: (25000, 500)
input_test shape: (25000, 500)


#### Modeling

In [31]:
model = Sequential()
model.add(Embedding(max_features, 32)) #max_feature=10,000 so, 320,000
model.add(SimpleRNN(32))               #(32+32+1)*32=2080
model.add(Dense(1, activation='sigmoid'))#(32+1)*1=33
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, 32)                2080      
_________________________________________________________________
dense (Dense)                (None, 1)                 33        
Total params: 322,113
Trainable params: 322,113
Non-trainable params: 0
_________________________________________________________________


In [32]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])

history = model.fit(input_train, y_train,epochs=10, batch_size=128, validation_split=0.2)

#25,000*0.8=20,000 (train on 20000samples) 5000 left for validation

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
import matplotlib.pyplot as plt
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()