New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why RepeatVector in the Sequence to Sequence Autoencoder? #5203

Closed
dhrushilbadani opened this Issue Jan 27, 2017 · 16 comments

Comments

Projects
None yet
8 participants
@dhrushilbadani
Copy link

dhrushilbadani commented Jan 27, 2017

In the code examples here, in the section titled "Sequence-to-sequence autoencoder," it reads:

[...] first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.

The code is:

from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

My question is, why are we doing the RepeatVector operation? In the literature regarding sequence to sequence autoencoders (for example in this often cited paper by Dai & Le), there's no repetition as such. Instead, they have the following diagram:
screen shot 2017-01-27 at 12 41 45 am

What am I missing here? What exactly is the input sequence to the Decoder portion of the autoencoder?

Thanks!

@bstriner

This comment has been minimized.

Copy link
Contributor

bstriner commented Jan 27, 2017

Not sure about interpreting the image but the paper says:

A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence.

So the example reads everything into a single vector, then uses that vector to reconstruct the original sequence. If you want to iteratively generate something but you only have one input, you can repeat the vector. That means each time step will get the same input but a different hidden state.

Cheers

@dhrushilbadani

This comment has been minimized.

Copy link

dhrushilbadani commented Jan 28, 2017

@bstriner Thanks! Do you have a link to some literature where they've used such an architecture? Almost all frequently cited papers that I found use a different architecture.

Similar to the picture in the post, in another popular paper by Srivastava et. al ('Unsupervised Learning of Video Representations using LSTMs'), they have the following diagram:

screen shot 2017-01-27 at 10 44 11 pm

It seems they're using the reversed input from the encoder as input here. There's a section as follows:

The decoder can be of two kinds – conditional or unconditioned. A conditional decoder receives the last generated output frame as
input, i.e., the dotted input in Fig. 2 is present. An unconditioned
decoder does not receive that input.

@bstriner

This comment has been minimized.

Copy link
Contributor

bstriner commented Jan 28, 2017

You can build an autoencoder either way in keras. Theoretically will train faster with a conditioned autoencoder but I haven't really compared the two.

@dhrushilbadani

This comment has been minimized.

Copy link

dhrushilbadani commented Jan 29, 2017

@bstriner Thanks again! Can you help me implement the one in the figure above? Specifically, I'm looking for two things:

  1. A way to feed the hidden state at the end of the encoder as the initial state for the decoder. How do I do that?

  2. To use the output from cell (t-1) as input to cell (t) in a LSTM.

Thanks!

@bstriner

This comment has been minimized.

Copy link
Contributor

bstriner commented Jan 29, 2017

Easiest way is probably to start with the example with repeat vector. Instead of the input just being the repeated final encoder state, concatenate it with the reversed sequence shifted once. Then your input to each LSTM decoder cell is the encoder state and the previous character.

During training, the input is the encoder state and the actual previous character, but during testing the input is the encoder state and the predicted previous character. Using the output as input during testing is slightly trickier. To do it all on the GPU you would probably have to build a custom call to K.rnn. You could also just loop on the CPU.

@dhrushilbadani

This comment has been minimized.

Copy link

dhrushilbadani commented Jan 31, 2017

Alright, thanks!

@rafaelpossas

This comment has been minimized.

Copy link

rafaelpossas commented Feb 9, 2017

Hi @dhrushilbadani I had the same questions as you and I am also interested in the implementation of the seq2seq Autoencoder. I wonder if you had any progress! Cheers

@dhrushilbadani

This comment has been minimized.

Copy link

dhrushilbadani commented Feb 10, 2017

@rafaelpossas I used the seq2seq library (built on top of Keras + RecurrentShop) as it offers greater flexibility in deciding how the cells in a particular layer interact with each other, thanks to RecurrentShop. Hope this helps!

@stale stale bot added the stale label May 23, 2017

@stale stale bot closed this Jun 22, 2017

@newbiesitl

This comment has been minimized.

Copy link

newbiesitl commented Oct 13, 2017

Hi @bstriner I'm little bit confused about the concatenation of hidden states and encoder final state.
I want to implement the RNN encoder-decoder described in this paper:
Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

The inputs of decoder is g(h_t, y_t-1, c), I understand once we add the RepeatVector, it will pass final state of encoder (which is c in this case) to decoder, but how can I combine c and y_t-1 (which is previous output) and pass it to LSTM cell?

My point is if I use RepeatVector, does LSTM still pass the output of current state to next state? Or the inputs of decoder will be the constant encoder final state for all decoding states? If I want to pass both encoding final state and decoder output to decoder, how can I combine or concatenate them? Could you give me an example?

Thanks!

@newbiesitl

This comment has been minimized.

Copy link

newbiesitl commented Oct 14, 2017

I just read about the TimeDistributed layer. In my previous example if I want to pass y_t-1 and c to decoder LSTM, should I add a TimeDistributed layer after LSTM layer?

@kmsravindra

This comment has been minimized.

Copy link

kmsravindra commented Jan 21, 2018

@colpain, I think this might help you...
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
As mentioned in this blog, if you want to use the encoded state along with the previous predicted output while inferring, you need to capture the final encoded state and define the decoder inference
model as a stand-alone model that takes three inputs - the final encoded states and the previous predicted value. You will not need Repeatvector or TimeDistributed layer. You can refer to this article that I wrote using the same example as in the keras example - https://towardsdatascience.com/neural-machine-translation-using-seq2seq-with-keras-c23540453c74

@HitLuca

This comment has been minimized.

Copy link

HitLuca commented Feb 7, 2018

I am doing the exact same thing, and used more or less the same code

model_inputs = Input(shape=(timesteps,))
inputs = Lambda(lambda x: K.expand_dims(x, -1))(model_inputs)
encoded = LSTM(latent_dim, return_sequences=False)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(1, return_sequences=True)(decoded)
decoded = Lambda(lambda x: K.squeeze(x, -1))(decoded)

sequence_autoencoder = Model(model_inputs, decoded)
sequence_autoencoder.compile(loss='mse', optimizer='adam')

earlyStopping = keras.callbacks.EarlyStopping(monitor='loss', patience=5, verbose=0, mode='auto')

sequence_autoencoder.fit(sparse_balances[:datapoints], sparse_balances[:datapoints],
                         batch_size=batch_num, epochs=100,
                         callbacks=[earlyStopping, result_plotter])

the model seems to be in theory correct, but the decoder lstm gets always stuck on predicting just a single value for the whole timeserie, no matter how long is the training. I think the example given in the tutorial is not conceptually correct or something is missing

@JuntingGuo

This comment has been minimized.

Copy link

JuntingGuo commented May 23, 2018

@HitLuca Encountered exactly the same error here. After the repitition the model.predict() function will give exact the same output with different inputs. I'm guessing they have changed the usage of encoder and decoder because the offical example of https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py ,they didn't use the output vector of encoder as input but discarded it and saved hidden states as inputs instead;.

@HitLuca

This comment has been minimized.

Copy link

HitLuca commented May 23, 2018

@JuntingGuo Based on my experiments the model is indeed working as intended, I just used a particularly bad dataset to train it with (spiking, sparse, unidimensional timeseries). With enough time and training, the decoder LSTM actually learns to decode the latent vector, even though with more difficulty, as the initial state is always the same.
If instead the decoder is fed with a repeated constant vector and the initial state is set from the encoder, then the results are better. This has the downside of not being as quick to implement as before, because the latent dimensionality is not given by the LSTM output size anymore and you need some architecture arrangements to get the same latent dimension.

TL:DR both implementations work, but passing the hidden LSTM state seems to work better

@JuntingGuo

This comment has been minimized.

Copy link

JuntingGuo commented May 24, 2018

@HitLuca Thanks for your reply. I'll look into that!

@mikejhuang

This comment has been minimized.

Copy link

mikejhuang commented Dec 13, 2018

@HitLuca

I have the same issue. I ended up initializing the decoder state with the encoder state and everything worked very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment