First of all, set environment variables and initialize spark context:

In [None]:
%env SPARK_DRIVER_MEMORY=8g
%env PYSPARK_PYTHON=/usr/bin/python3.5
%env PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5

from zoo.common.nncontext import *
sc = init_nncontext(init_spark_conf().setMaster("local[4]"))

# RNN


In [None]:
from zoo.pipeline.api.keras.models import Sequential
from zoo.pipeline.api.keras.layers import Embedding, SimpleRNN

#### Specify input shape
We could add an embedding layer as our first layer in Keras as following:
    
    model = Sequential()
    model.add(Embedding(10000, 32))
In analytics-zoo, you need to specify the input shape of first layer, in this example, the sequence length is 500, so we could build our model as following:

In [None]:
model = Sequential()
model.add(Embedding(10000, 32, input_shape=(500,)))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32))  # This last layer only returns the last outputs.
model.summary()

Now let's try to use such a model on the IMDB movie review classification problem. First, let's preprocess the data:

In [None]:
from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 10000  # number of words to consider as features
maxlen = 500  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

(input_train, y_train), (input_test, y_test) = imdb.load_data(nb_words=max_features)
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)
print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)

Let's train a simple recurrent network using an `Embedding` layer and a `SimpleRNN` layer:

In [None]:
from zoo.pipeline.api.keras.layers import Dense

model = Sequential()
model.add(Embedding(max_features, 32, input_shape=(500,)))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model.fit(input_train, y_train,
                    nb_epoch=10,
                    batch_size=128,
                    #validation_split=0.2
                    )

Now let's switch to more practical concerns: we will set up a model using a LSTM layer and train it on the IMDB data.

In [None]:
from zoo.pipeline.api.keras.layers import LSTM

model = Sequential()
model.add(Embedding(max_features, 32, input_shape=(500,)))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
model.set_tensorboard('./', '6-2_summary')
history = model.fit(input_train, y_train,
                    nb_epoch=10,
                    batch_size=128,
                    #validation_split=0.2
                    )