#### Highlights
* Varient of LSTM
* Retains the resistance to vanishinig gradient problem
* Simple internal structure
* Trains faster
* Fewer computation required for update hidden states
* LSTM in theory remember longer sequences than GRU & outperform them requiring model long-distance relations

* Gates Information - LSTM have input, forget & output gate, GRU have update gate & reset gate

<img src="https://www.safaribooksonline.com/library/view/deep-learning-with/9781787128422/assets/gru-cell.png">


<img src="https://www.safaribooksonline.com/library/view/deep-learning-with/9781787128422/assets/gru-eq1.png">

In [1]:
import numpy as np
import pandas as pd

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import GRU
from keras.callbacks import EarlyStopping

from keras.datasets import imdb

Using TensorFlow backend.


In [2]:
n_words = 1000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=n_words)
print('Train seq: {}'.format(len(X_train)))
print('Test seq: {}'.format(len(X_test)))

Train seq: 25000
Test seq: 25000


In [3]:
X_train.shape

(25000,)

In [9]:
len(X_train[4])

147

In [10]:
X_test.shape

(25000,)

#### Pad sequence with max_len

In [11]:
max_len = 200
X_train = sequence.pad_sequences(X_train, maxlen=max_len)
X_test = sequence.pad_sequences(X_test, maxlen=max_len)

In [12]:
X_train.shape

(25000, 200)

In [16]:
X_train[71]

array([  4, 807,   9,   2,   2,  19,   2,   2, 411,   5,   2,  34,   2,
       156,  37, 481,  40,  68, 886,   6, 229,  18,   4,  86,  58,   4,
         2,   2,  22, 405,   9,   6, 706,   2,   7,   4,   4,   2, 405,
         2, 302,   4, 105,  81,  24,  60, 511,  40,   6,   2, 415,  62,
        18, 463,   4, 109,  37,   9, 267,  18,  41,   2, 799,  33,  41,
       344,   2,  41,  96, 143,   4,   2,   2,   2, 187,   4, 313,  32,
         2,   5,   2,   5,  59, 152,  60, 280, 683,  46,  41,   2, 403,
         8,  67,  48,  59,   9, 344,  51,  25,  62, 104,  59,  69,  43,
         2,  41, 799, 305,   7,   2,  18,  41,  96,  99, 111,   2,   8,
        41,   2,  99, 111,   2,   2,   9,   6, 801,   2,   7,  35,   2,
       167,  12,   9, 165, 163, 149,   4,   2, 665,   7,   4, 255,   2,
        41, 519, 180,   4, 890,  56,   4,   2, 187,  14,   2, 120, 133,
       120,  50, 449,   6, 499, 650, 150,   6,   2, 650, 195, 460,  25,
        62, 104,  25,  26, 149,   6, 248,   2,  18,   4,   2, 39

In [17]:
# Define network architecture and compile
model = Sequential()
model.add(Embedding(n_words, 50, input_length=max_len))
model.add(Dropout(0.2))
model.add(GRU(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(250, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',  optimizer='adam', metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 200, 50)           50000     
_________________________________________________________________
dropout_1 (Dropout)          (None, 200, 50)           0         
_________________________________________________________________
gru_1 (GRU)                  (None, 100)               45300     
_________________________________________________________________
dense_1 (Dense)              (None, 250)               25250     
_________________________________________________________________
dropout_2 (Dropout)          (None, 250)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 251       
Total params: 120,801
Trainable params: 120,801
Non-trainable params: 0
_________________________________________________________________


In [18]:
callbacks = [EarlyStopping(monitor='val_acc', patience=3)]

In [None]:

batch_size = 512
n_epochs = 100

model.fit(X_train, y_train, batch_size=batch_size, epochs=n_epochs, validation_split=0.2, callbacks=callbacks)

Train on 20000 samples, validate on 5000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
 2560/20000 [==>...........................] - ETA: 5:46 - loss: 0.4341 - acc: 0.8172