New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent Models with sequences of mixed length #40

Closed
makroiss opened this Issue Apr 8, 2015 · 15 comments

Comments

Projects
None yet
@makroiss

makroiss commented Apr 8, 2015

The training process for LSTM only supports tensor3. If the sequences are of different length, then X must be a list, however models.py:90 does not support lists as input. I think a quick fix would be to cast X_batch to tensor3 if batch_size=1, and also fix y_batch accordingly.

@patyork

This comment has been minimized.

Contributor

patyork commented Apr 8, 2015

Sequences should be grouped together by length, and segmented manually into batches by that length before being sent to Keras.

Alternatively (or in addition to the above, to get more sequences of the same length), if it does not break the logic in the cost function, the sequences can be padded with 0s (or the equivalent non-entity).

The reason that lists are not supported is that Theano builds everything as tensors, or matrices of matrices, so everything must have the same dimensionality (Theano does not assume it should pad with 0s where lengths differ).

@fchollet

This comment has been minimized.

Collaborator

fchollet commented Apr 8, 2015

In addition, here are a few quick examples of solutions to your problem:

Zero-padding

X = keras.preprocessing.sequence.pad_sequences(sequences, maxlen=100)
model.fit(X, y, batch_size=32, nb_epoch=10)

Batches of size 1

for seq, label in zip(sequences, y):
   model.train(np.array([seq]), [label])
@hqsiswiliam

This comment has been minimized.

hqsiswiliam commented Sep 3, 2016

@fchollet sorry if I bother you, but I can't find model.train(np.array([seq]), [label]) in keras document for batch size 1.

@shamitlal

This comment has been minimized.

shamitlal commented Oct 25, 2016

@fchollet In the case of batch size 1 method , what should be assigned to input_length parameter in the model? Or should it be set to NONE in this case?

@visionscaper

This comment has been minimized.

visionscaper commented Nov 10, 2016

@fchollet , just for my understanding, when you 'pad_sequences', the padded zeros are fed through the sequence network (e.g. recurrent NN), correct?

What I was looking for is a method where this doesn't happen; I only want to input the real sequence, with each sequence having another length, and subsequently use the output for further processing.

It seems to me that padding the sequences will make it harder to learn the task at hand, since the zeros don't provide info but they get encoding by the network any way.

@brortao

This comment has been minimized.

brortao commented Nov 17, 2016

@visionscaper yes, the padding still goes through the network. If you don't want this, you might want to look in to sequence-to-sequence learning, e.g. with farizrahman4u/seq2seq. This paper explains the idea.

@VanushVaswani

This comment has been minimized.

VanushVaswani commented Nov 17, 2016

Or use the masking layer..

#3086

On Fri, Nov 18, 2016 at 8:36 AM, Tao Bror Bojlén notifications@github.com
wrote:

@visionscaper https://github.com/visionscaper yes, the padding still
goes through the network. If you don't want this, you might want to look in
to sequence-to-sequence learning, e.g. with farizrahman4u/seq2seq
https://github.com/farizrahman4u/seq2seq. This paper
https://arxiv.org/abs/1409.3215 explains the idea.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#40 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHK7nQTN8J20oWH46g-FkppQNKn2YbSXks5q_MjtgaJpZM4D8Wcd
.

@LopezGG

This comment has been minimized.

LopezGG commented Nov 28, 2016

@patyork : You mentioned

Sequences should be grouped together by length, and segmented manually into batches by that length before being sent to Keras.
I initialize my model as :
model_LSTM = Sequential()
model_LSTM.add(LSTM(lstm_hidden_units, input_shape=(X.shape[1], X.shape[2])))

and I plan on calling "model.train_on_batch(X, y)" on every batch. The problem is how can I intialize input_shape in LSTM when it varies across batches ?

@patyork

This comment has been minimized.

Contributor

patyork commented Nov 28, 2016

@LopezGG The shape of a single temporal (or any other kind of) "frame" of the input sequence must be the same shape with the varying dimension being the length or number of "frames" each sample has.

For example, excluding anything to do with batches and batch size: a set of video clips all have a resolution of 1920x1080 pixels but can vary in duration. In this case, the input shape is 1920 by 1080 which is the "frame" size, and the varying dimension is the duration/length of the video - such as 120 frames/4 seconds of video. The sequence for this example video would be 120 frames of 1920*1080 pixels. Any length of video can be fed through this network, so long as it is a 1920x1080 feed.

Going one step further: if you want to use batches of videos to train concurrently, the sequences in each batch must be the same length. One way to accomplish this is to predefine a few "buckets" of temporal length, for example "up to 2 seconds, up to 4 seconds, etc". You can then bucket your video clips, padding when necessary (with all black frames) to get all of the clips to the cutoff/bucket duration.

@patyork

This comment has been minimized.

Contributor

patyork commented Nov 28, 2016

@visionscaper to follow up: "padding" your input is necessary, but it should be done in a way that makes sense. For examples:

  • with video: pad the input with the equivalent of black frames of video
  • with audio: pad the input with the equivalent of silence

Padding with just straight zeros will, as you guessed, more than likely encode some unnecessary - if not incorrect - information into the network. Padding with a "neutral" frame of data is the correct approach.

@carlthome

This comment has been minimized.

Contributor

carlthome commented Nov 29, 2016

@patyork, sorry, but wouldn't Masking() take care of this? Over all minibatches in an epoch some sequences might end earlier than the longest, and should thus not be given any weight in the forward pass so we make sure the samples are set to zero and then also make sure to reduce the loss accordingly before updating the weights in the backward pass. I guess it would be extremely important to normalize data if masking is being used.

@patyork

This comment has been minimized.

Contributor

patyork commented Nov 29, 2016

@carlthome Yes, the Masking layer appears to be exactly what is needed. This thread predates that layer addition, and I was unaware of its intended use.

The masking layer looks great for most applications, but I would think the "pad with neutral data" approach should be kept in mind for some applications. Specifically, I would think that for speech recognition, it would be good to embed the idea of "silence" as a valid input sequence, and a very expected input sequence.

@vsoto

This comment has been minimized.

vsoto commented Dec 8, 2016

What's the difference between using a Masking layer for sequences of different lengths and setting the mask_zero field to True?

@louisabraham

This comment has been minimized.

louisabraham commented Dec 18, 2016

I had the same problem. You have basically 2 solutions (let's say the dimension of you input is n) :

Either you use the parameters :

batch_input_shape=(1, 1, n), stateful=True

Then you train with :

for _ in range(nb_epoch):
    model.fit(X, Y, batch_size=1, nb_epoch=1, shuffle=False)
    model.reset_states()

or with :

for _ in range(nb_epoch):
    for x, y in zip(X, Y):
        model.fit(x, y, batch_size=1, nb_epoch=1, shuffle=False)
    model.reset_states()

and with X of shape (length, 1, n).

I don't know if the two methods are equivalent though… Maybe there are more gradient updates with the second…

Or you define the model with

input_shape=(None, n)

(and stateful=False by default)
and you train with :

model.fit(X, Y, nb_epoch=nb_epoch)

and X has shape (1, length, n).

howard0su pushed a commit to howard0su/keras that referenced this issue Jan 31, 2017

@wangqianwen0418

This comment has been minimized.

wangqianwen0418 commented Jul 29, 2017

@hqsiswiliam

but I can't find model.train(np.array([seq]), [label]) in keras document for batch size 1.

it should be model.train_on_batch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment