LSTM - Sequences with different num of time steps #85

benjaminklein · 2015-04-26T18:59:47Z

Hi,

Could you explain how this library is handling sequences with different number of time steps? Specifically - can we have sequences with different number of time steps and if so where one can supply the length of the sequence?

Thank you!

fchollet · 2015-04-26T19:20:44Z

You can, but you would have to pad the shorter sequences with zeros, since all inputs to Keras models must be tensors. Here's an example of how to do it: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py#L46

Another solution would be to feed sequences to your model one sequence at a time (batch_size=1). Then differences in sequence lengths would be irrelevant.

benjaminklein · 2015-04-26T19:39:32Z

Thank you for the quick answer. So after padding, how the library knows to ignore the padded values (and not to use them in the training).

fchollet · 2015-04-26T19:43:34Z

It doesn't know. But it learns to ignore them: in practice, sequence padding won't noticeably impact training. But if you're worried about it, you can always use batches of size 1.

lemuriandezapada · 2015-04-28T16:52:21Z

I have been experimenting in my own implementations with output masks that manually set the error gradients for datapoints you don't want to train on to 0 and so they don't receive any updates. It would be a nice feature to have.

patyork · 2015-04-28T17:14:53Z

There are two simple and most often implemented ways of handling this:

Bucketing and Padding
1. Separate input sample into buckets that have similar length, ideally such that each bucket has a number of samples that is a multiple of the mini-batch size
2. For each bucket, pad the samples to the length of the longest sample in that bucket with a neutral number. 0's are frequent, but for something like speech data, a representation of silence is used which is often not zeros (e.g. the FFT of a silent portion of audio is used as a neutral padding).
Bucketing
1. Separate input samples into buckets of exactly the same length
  - removes the need for determining what a neutral padding is
  - however, the size of the buckets in this case will frequently not be a multiple of the mini-batch size, so in each epoch, multiple times the updates will not be based on a full mini-batch.

pengpaiSH · 2015-07-29T05:32:08Z

@fchollet Actually I have a similar (may look stupid) question: When I check the imdb data, e.g. X_train[1] after operating pad_sequences(nb_timesteps = 100), I notice that the result will only preserve the very last 100 words to make the sequence vector length 100. My question is why not the first 100 words?

haoqi · 2015-12-14T00:38:26Z

I faced the same problem, if I want to train by batchsize of size 1, which function should i use?? thanks

philipperemy · 2016-03-04T13:11:42Z

@haoqi batch_size=1 in model.fit

philipperemy · 2016-03-04T13:13:14Z

@paipai880429 it's a matter of point of view. Usually the stronger information is stored at the end of the comment rather than at the beginning. Anyway you have to make a choice.

farizrahman4u · 2016-03-04T16:33:23Z

@fchollet Instead of using batch_size = 1, you could presort your data by sequence length, and group sequences of same length into batches, and call train_on_batch on each batch right?

aabversteeg · 2016-04-05T13:11:03Z

Hi, I'm planning to use the approach of "batch_size = 1" to allow for arbitrary input lengths. However, what dimensions should I use for the input_shape argument? For example:
model.add(LSTM(512, return_sequences=True, input_shape=(maxlen, len(chars))))
What should I replace "maxlen" with?

wangpichao · 2016-05-17T11:33:12Z

@aabversteeg Have you figured out what to do in your case? I am also facing this problem.

aabversteeg · 2016-05-17T12:14:26Z

I was not able to put anything to work, but I believe to allow for arbitrary sequence lengths you must supply None for the dimension that should be arbitrary. So in the above case you should replace maxlen with None.

FernandoLpz · 2017-01-18T21:04:20Z

Hi everyone.

I am starting to learn LSTM and I have a little doubt, what is a "time step"? a time step, is the length of the sequence?. I will really thanks your answer.

patyork · 2017-01-18T22:25:31Z

A timestep is one step/element of a sequence. For example, each frame in a video is a timestep; the data for that timestep is the RGB picture at that frame.

anirudhgupta22 · 2017-04-06T10:25:06Z

Hi everyone,

I am working with video frames, my training data consists of video files of variable length due to which the no of timesteps (or the no of frames) in video files are variable. Can you please help me in using LSTM for such scenario.

Thanks

FernandoLpz · 2017-04-06T16:16:28Z

Hi! I think you could you use sequence.pad_sequence(X_train, maxlen) from keras. I had the same problem working with texts. 2017-04-06 5:25 GMT-05:00 anirudhgupta22 <notifications@github.com>:

…

Hi everyone, I am working with video frames, my training data consists of video files of variable length due to which the no of timesteps (or the no of frames) in video files are variable. Can you please help me in using LSTM for such scenario. Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#85 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVRsKiD5RoTmqHkySaJ82I-bf8gCtSqcks5rtL2UgaJpZM4EJISM> .

QuantumLiu · 2017-04-17T15:11:59Z

@fchollet
In the imdb_lstm eample,arguments use_bias of layers is True.
So ,unless we have paded the sequence ,the bias is still working in paded timesteps.
Would it cause some problems?

Binteislam · 2017-05-04T07:43:18Z

I have images of different widths and fixed height of 46px and total samples are 1000. How should i define the input shape of my data to an LSTM Layer, using functional api???

philipperemy · 2017-05-04T09:26:45Z

@Binteislam convert them to the same width and height. That will be the simplest for you.
Or you can convert all the images to max(widths) x height and pad with blank the rest.

Binteislam · 2017-05-04T12:23:30Z

@philipperemy Is there no other solution? If I convert them to fixed width it destroys my data as I am working on OCR, whereas if I pad them with black it wastes huge memory.

philipperemy · 2017-05-04T13:43:58Z

@Binteislam I'm not aware of a better solution. Or maybe you can feed them one by one but you increases the computational time (batch size = 1)

Binteislam · 2017-05-04T14:46:05Z

@philipperemy If I keep batch size=1, how to define the input shape?
@patyork could you please suggest a solution?

Kevinpsk · 2017-07-06T08:41:58Z

@Binteislam Hi I think you just need to set the dimension of timestep to None. But I am not sure how should you formulate the training data in this case. Add each image in a list?

philipperemy · 2017-07-06T09:32:29Z

@Kevinpsk You cannot batch sequences of different length together.
@Binteislam it would like (batch_input_shape, time_length, input_dim) = (1, None, input_dim)
Then you can give any sequence to your model, as long as you feed them one by one.

Other possibilities are:

pad them with zeros
group sequences of same-length together inside a batch

Kevinpsk · 2017-07-06T10:04:00Z

@philipperemy Hi, Yeah i understand that after reading other people's posts. But in the case of video processing, each image at each timestep would be 2D, how shall I specify input shape then? Something like (batch_size, timestep, input_dim) = (1, None, (height, width))? In this case, I will be training video files of variable length one by one?

philipperemy · 2017-07-07T01:09:53Z

@Kevinpsk In your case I would advise you to have a look at Conv3D

https://keras.io/layers/convolutional/

It's specifically done for handling videos the same way a regular Conv Net handles images.

If you still want to stick to your LSTM, then input_dim = height * width. Just flatten the last two dimensions. You will have (batch_size, timestep, input_dim) = (1, None, height * width)

habdullah · 2017-07-11T10:01:55Z

@philipperemy
Is it possible to feed batches with varying number of timesteps ?
(The seq length within each batch is kept same)
for example for batch size of 100:
(100,250,78)
(100,300,78)
(100,167,78)
If yes, what would be the input shape? setting to (100,None,78) gives error

philipperemy · 2017-07-11T14:17:36Z

@habdullah yes it's possible and should work. The number of parameters of recurrent networks do not depend on the time length. It's only a batch problem that prevents from using different lengths. In your case it should work well.

habdullah · 2017-07-12T05:44:49Z

@philipperemy
It worked. the problem was in the batch generator.
input shape = (None,78) worked for me

philipperemy · 2017-07-12T07:38:34Z

@habdullah ok cool.

machanic · 2017-08-10T13:52:57Z

@patyork because your batch_size you pass as argument is not the same as each bucket maximum length?
How to deal with batch_size ? e.g. You process these article inside one bucket , and then you change to another bunch of article inside "this" bucket?
that programming work will be very complex?

LeZhengThu · 2017-12-12T01:51:56Z

@lemuriandezapada Hi, I'm facing the same problem and think your idea is the way to solve this. Can you kindly show how to code your idea? Thanks.

tbennun · 2018-05-30T21:12:25Z

@LeZhengThu I'm not sure if this is still helpful or relevant, but I had the same problem so I wrote a generic bucketed Sequence class for exactly this purpose. It is used as a generator and sped up training for me in orders of magnitude (~100x faster, since some sequences were very long but that did not reflect the median sequence length).

https://github.com/tbennun/keras-bucketed-sequence

AshimaChawla · 2018-06-08T12:25:00Z

Hi @fchollet, using batch size=1, has some performance issues on GPU. It takes really long to train the sequence of variant length.

Could you please guide.

/Ashima

dbsousa01 · 2018-06-12T11:28:36Z

Sorry to bring this up again @fchollet but I am having a problem with how to present the training data. I also want to analyse video and let's say I have 100 videos and in each 5 frames so 500 frames total. How do I build the training data so I can feed a 5D vector to my neural network? I suppose that the input shape should be (nb of frames, nb of sequence, rows, cols, channels) where nb of frames is 500 (?) and the nb of sequence is between 1 and 5 depending the order of the frame in each video. Am I thinking correctly?
Thank you

NookLook2014 · 2019-04-17T12:01:02Z

@habdullah i'm doing LSTM encoder-decoder, if i set input shape = (None,78) , do you have any idea of how to use RepeatVector(n) to let n matches the real shape[0] of input dynamically

Deltaidiots · 2019-07-01T12:51:51Z

How can we use variable number of time steps per sample? number of features for each timestep remains same.

e.g

x_train = [

[[0.41668948], #1
[0.38072783], #2
[0.70242528]], #3

[[0.65911036],#1
[0.01740353],#2
[0.03617037],#3
[0.04617037]]#4

]

people are saying to use padding but i don’t know how padding will solve this ? what would be the shape of input array then?

I have tried using None in input shape but it doesn't work

You can, but you would have to pad the shorter sequences with zeros, since all inputs to Keras models must be tensors. Here's an example of how to do it: https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py#L46

Another solution would be to feed sequences to your model one sequence at a time (batch_size=1). Then differences in sequence lengths would be irrelevant.

rzilleruelo · 2019-07-19T01:32:23Z

@Deltaidiots, following your example the padding means adding a value that is not present in your data as a marker of no data. For example, add a zero at the end of your first sequence.

x_train = [
  [[0.41668948], [0.38072783], [0.70242528], [0.0]],
  [[0.65911036], [0.01740353], [0.03617037], [0.04617037]]
]

this obviously works if you can assume 0.0 is not real value in your data. For example, if your values cannot be negative you could use a negative number or sum one to all numbers and use zero as padding. If there no simple transformation to make sure there is a value you can use as padding. You can increase the dimension if your data and use that extra dimension to make sure you can create a value that is not in your data domain. For example.

x_train = [
  [[1.0, 0.41668948], [1.0, 0.38072783], [1.0, 0.70242528], [0.0, 0.0]],
  [[1.0, 0.65911036], [1.0, 0.01740353], [1.0, 0.03617037], [1.0, 0.04617037]]
]

Then, Keras has a layer to tell this explicitely to the network.
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Masking

MasterHansCoding · 2021-03-31T14:49:02Z

Hello, if you are using batch_size = 1, and return_sequence = True, I think I read somewhere that at every batch, the cell state is reset

fchollet closed this as completed Jun 8, 2015

holderm mentioned this issue Oct 4, 2015

lstm with processing EEG data #774

Closed

GelliFrancesco mentioned this issue May 30, 2016

Special Delimiters for zero-padded sequences #2848

Closed

LSTM - Sequences with different num of time steps #85

LSTM - Sequences with different num of time steps #85

Comments

benjaminklein commented Apr 26, 2015

fchollet commented Apr 26, 2015

benjaminklein commented Apr 26, 2015

fchollet commented Apr 26, 2015

lemuriandezapada commented Apr 28, 2015

patyork commented Apr 28, 2015

pengpaiSH commented Jul 29, 2015

haoqi commented Dec 14, 2015

philipperemy commented Mar 4, 2016

philipperemy commented Mar 4, 2016

farizrahman4u commented Mar 4, 2016

aabversteeg commented Apr 5, 2016

wangpichao commented May 17, 2016

aabversteeg commented May 17, 2016

FernandoLpz commented Jan 18, 2017

patyork commented Jan 18, 2017

anirudhgupta22 commented Apr 6, 2017

FernandoLpz commented Apr 6, 2017 via email

QuantumLiu commented Apr 17, 2017

Binteislam commented May 4, 2017

philipperemy commented May 4, 2017 • edited

Binteislam commented May 4, 2017

philipperemy commented May 4, 2017

Binteislam commented May 4, 2017

Kevinpsk commented Jul 6, 2017

philipperemy commented Jul 6, 2017

Kevinpsk commented Jul 6, 2017

philipperemy commented Jul 7, 2017 • edited

habdullah commented Jul 11, 2017

philipperemy commented Jul 11, 2017

habdullah commented Jul 12, 2017

philipperemy commented Jul 12, 2017

machanic commented Aug 10, 2017

LeZhengThu commented Dec 12, 2017

tbennun commented May 30, 2018

AshimaChawla commented Jun 8, 2018

dbsousa01 commented Jun 12, 2018

NookLook2014 commented Apr 17, 2019

Deltaidiots commented Jul 1, 2019 • edited

rzilleruelo commented Jul 19, 2019 • edited

MasterHansCoding commented Mar 31, 2021

philipperemy commented May 4, 2017 •

edited

philipperemy commented Jul 7, 2017 •

edited

Deltaidiots commented Jul 1, 2019 •

edited

rzilleruelo commented Jul 19, 2019 •

edited