Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement a deep bidirectional LSTM? #1629

Closed
udani969 opened this issue Feb 3, 2016 · 18 comments
Closed

How to implement a deep bidirectional LSTM? #1629

udani969 opened this issue Feb 3, 2016 · 18 comments

Comments

@udani969
Copy link

udani969 commented Feb 3, 2016

I am trying to implement a LSTM based speech recognizer. So far I could set up bidirectional LSTM (i think it is working as a bidirectional LSTM) by following the example in Merge layer. Now I want to try it with another bidirectional LSTM layer, which make it a deep bidirectional LSTM. But I am unable to figure out how to connect the output of the previously merged two layers into a second set of LSTM layers. I don't know whether it is possible with Keras. Hope someone can help me with this.

Code for my single layer bidirectional LSTM is as follows

left = Sequential()
left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13)))
right = Sequential()
right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

Dimensions of my x and y values are as follows.

(100, 'train sequences')
(20, 'test sequences')
('X_train shape:', (100, 99, 13))
('X_test shape:', (20, 99, 13))
('y_train shape:', (100, 99, 11))
('y_test shape:', (20, 99, 11))

@farizrahman4u
Copy link
Contributor

#1282 will help. Works only for theano though.

@farizrahman4u
Copy link
Contributor

Or you could simply use the following fork function to make 2 copies of your merged layer:

def fork (model, n=2):
    forks = []
    for i in range(n):
        f = Sequential()
        f.add (model)
        forks.append(f)
    return forks
left = Sequential()
left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13)))
right = Sequential()
right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

#Add second Bidirectional LSTM layer

left, right = fork(model)

left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid'))

right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid',  go_backwards=True))

#Rest of the stuff as it is

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

It would be better to use the Bidirectional wrapper or the Graph for this sort of stuff.

@udani969
Copy link
Author

udani969 commented Feb 3, 2016

Wow it worked. I used the fork method, because it said some checks were not successful under the Wrapper approach. Just now only I could get it to work. Thanks a lot for the support.

@udani969 udani969 closed this as completed Feb 3, 2016
@talentlei
Copy link

talentlei commented May 3, 2016

@farizrahman4u I use your code as above and get a model. but When I load the model and test , I got error as follow:

File "BLSTM_NER.py", line 1058, in
test()
File "BLSTM_NER.py", line 1038, in test
ner.rnn_test(resfile,model_file,weights)
File "BLSTM_NER.py", line 943, in rnn_test
out = model.predict([self.X_test,self.X_test],batch_size=batch_size)
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 693, in predict
return self._predict_loop(self._predict, X, batch_size, verbose)[0]
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 356, in _predict_loop
batch_outs = f(ins_batch)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 448, in call
return self.function(*inputs)
File "/home/cl/download/Theano/theano/compile/function_module.py", line 845, in call
self.inv_finder[c]))
TypeError: Missing required input: <TensorType(float32, 3D)>

my code of test is as follow:

    print "load model"
       model = model_from_json(open(my_model).read())
       model.load_weights(weights)
       print "load model finish" 
       out = model.predict([self.X_test,self.X_test],batch_size=batch_size)

How I got this error ? can you help me ? thanks~

@Windy-Ground
Copy link

https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py

@vinayakumarr
Copy link

i was trying @farizrahman4u example of deep bidirectional LSTM for my dataset whic has 50000 rows and 20 columns(19 features and 1 class label) and

X_train = sequence.pad_sequences(X_train, maxlen=100)
X_test = sequence.pad_sequences(X_test, maxlen=100)

I am getting the following error. i know it is because of dimension shape in model.fit function but i dont know how to resolve this.
untitled

@farizrahman4u
Copy link
Contributor

The problem is with the shape of your input data. The error message is pretty clear, lstm needs 3d data, but you are providing it 2d. The example I provided above is obsolete, use the functional api instead.

@9thDimension
Copy link

9thDimension commented Aug 7, 2016

@farizrahman4u When you say "functional API", what do you mean exactly?

I saw this syntax here:
model.add(Bidirectional(LSTM(10, input_shape=(5, 10), return_sequences=True)))
But I don't know which package to import the Bidirectional() class from

and this syntax here:
backwards = LSTM(64, go_backwards=True)(embedded)
But then I'm not exactly sure how to make a multi-layer biridectional LSTM (use the forking approach you described above on Feb 3rd?)

P.S. I want many-to-many sequence labelling, so where do I need to put the return_sequences=True flags?

@farizrahman4u
Copy link
Contributor

Google for Keras functional api. The bidirectional wrapper is from my seq2seq library.

@9thDimension
Copy link

9thDimension commented Aug 7, 2016

@farizrahman4u Oh it's part of the seq2seq library I see.

Is this the correct usage to make a 2-layer bidirectional LSTM to output a category prediction for every input character?

Input chars are 43-dimensional, and there are 5 possible output categories.

from keras.models import Sequential
from keras.layers import Activation, LSTM, Merge, TimeDistributedDense
from keras.optimizers import SGD

def fork (model, n=2):
    forks = []
    for i in range(n):
        f = Sequential()
        f.add (model)
        forks.append(f)
    return forks

# First bidirectional LSTM layer

forward = Sequential()
forward.add(LSTM(output_dim=512, input_shape=(50, 43), return_sequences=True))
backward = Sequential()
backward.add(LSTM(output_dim=512, input_shape=(50, 43), return_sequences=True, go_backwards=True))

model = Sequential()
model.add(Merge([forward, backward], mode='concat'))


# Second bidirectionl LSTM layer

forward_2, backward_2 = fork(model)

forward_2.add(LSTM(output_dim=512, input_shape=(50, 512), return_sequences=True))
backward_2.add(LSTM(output_dim=512, input_shape=(50, 512), return_sequences=True, go_backwards=True))

model = Sequential()
model.add(Merge([forward_2, backward_2], mode='concat'))


# Softmax decision layer

model.add(TimeDistributedDense(output_dim=5))
model.add(Activation('softmax'))


# Optimizer function

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

Also, For this type of architecture, do the inputs have to "overlap" like so:

x_0 = [0, 1, 2, 3, 4], y_0 = [A, B, C, D, E]
x_1 = [1, 2, 3, 4, 5], y_1 = [B, C, D, E, F]
x_2 = [2, 3, 4, 5, 6], y_2 = [C, D, E, F, G]

or not overlap like so:

x_0 = [0, 1, 2, 3, 4],      y_0 = [A, B, C, D, E]
x_1 = [5, 6, 7, 8, 9],      y_1 = [F, G, H, I, J]
x_2 = [10, 11, 12, 13, 14], y_2 = [K, L, M, N, O] 

@vinayakumarr
Copy link

@farizrahman4u before posting it i know the error i am getting because of dimension problem. I have train data set which is of size 390321 and 23 classes and test data set 20000 (i have correct label also which has 40) I am loading train, test and correct label data set and i am trying to apply deep bidirectional stateful lstm.

train data set size is 390321_41 (40 features and another one is class label)
test data set size is 20000_40
corrected label size is 20000*1

how to reshape the dimension and apply to deep bidirectional stateful lstm?

@strin
Copy link
Contributor

strin commented Sep 9, 2016

@farizrahman4u @9thDimension when running lstm in the reverse direction, shouldn't the output corresponds to input_n, input_{n-1}, input_{n-2}, ..., input_1? In that case, when concatenating with the output from the forward direction, we should reverse it?

@farizrahman4u
Copy link
Contributor

@strin I have added the Bidirectional wrapper to Keras.. set the bidirectional lstm example.

@williamjqk
Copy link

Official manual can be referenced here, https://keras.io/layers/wrappers/#bidirectional

@grafael
Copy link

grafael commented Aug 31, 2017

I'm afraid that the Bidirectional Wrapper will not work in Keras Functional Api.
Any help in this sort of thing:

main_input = Input(shape=(100,), dtype='int32', name='main_input')
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
lstm = LSTM(32)(x)
bidirectional = Bidirectional()(lstm) #how bidirectional should be instantiated?

@jojonki
Copy link

jojonki commented Oct 28, 2017

@grafael

How about this? Bidirectional has a layer at first arg.
bidirectional = Bidirectional(LSTM(32))(x)

@ylmeng
Copy link

ylmeng commented Oct 28, 2017

Doesn't the 'go_backwards' option reverse the output order too? so model.add(Merge([left, right], mode='sum')) does not make sense (you must flip one of them before adding)?

@Ap1075
Copy link

Ap1075 commented May 15, 2018

@ylmeng
Yes, it is handled automatically. You don't have to flip it before merging as far as i know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests