**LSTM** in its core, preserves information from inputs that has already passed through it using the hidden state.

**Unidirectional LSTM** only preserves information of the past because the only inputs it has seen are from the past.

**Using bidirectional** will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backwards you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.

Lets say we try to predict the next word in a sentence, on a high level what a unidirectional LSTM will see is

The boys went to ....

And will try to predict the next word only by this context, with bidirectional LSTM you will be able to see information further down the road for example

Forward LSTM:

The boys went to ...

Backward LSTM:

... and then they got out of the pool

## How you would implement Bidirectional LSTM without calling BiLSTM module. 

This might better contrast the difference between a uni-directional and bi-directional LSTMs. As you see, we merge two LSTMs to create a bidirectional LSTM.

You can merge outputs of the forward and backward LSTMs by using either {'sum', 'mul', 'concat', 'ave'}.

In [None]:
left = Sequential()
left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13)))
right = Sequential()
right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

## BiLSTM or BLSTM

In comparison to LSTM, BLSTM or BiLSTM has two networks, one access pastinformation in forward direction and another access future in the reverse direction

In [None]:
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5,10)))

# activation function can be added like this:
model = Sequential()
model.add(Bidirectional(LSTM(num_channels,implementation = 2, recurrent_activation = 'sigmoid'),input_shape=(input_length, input_dim)))


Complte model using Bi-directional LSTM on imdb dataset.

In [None]:
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.datasets import imdb


n_unique_words = 10000 # cut texts after this number of words
maxlen = 200
batch_size = 128

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=n_unique_words)
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
y_train = np.array(y_train)
y_test = np.array(y_test)

model = Sequential()
model.add(Embedding(n_unique_words, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=4,
          validation_data=[x_test, y_test])