Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM Autoencoder #1401

Closed
dpappas opened this issue Jan 4, 2016 · 18 comments
Closed

LSTM Autoencoder #1401

dpappas opened this issue Jan 4, 2016 · 18 comments

Comments

@dpappas
Copy link

dpappas commented Jan 4, 2016

Hello everyone and happy new year

I am trying to create an LSTM Autoencoder as shown on the image bellow.

the_cat_sat_autoencoder

The encoder consumes the input "the cat sat",
and creates a vector depicted as the big red arrow.

The decoder takes this vector and tries to reconstruct the sequence
given the position in the sentence.

I would like to save this vector (big red arrow) to use it on another model.

The code i have wrote so far is the following:

from keras.layers import containers
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, AutoEncoder
import numpy as np
from keras.layers.recurrent import LSTM

train_x = [
    [[ 1 ,3 ],[ 1 ,3 ]],
    [[ 2 ,4 ],[ 2 ,4 ]],
    [[ 3 ,5 ],[ 3 ,5 ]]
]
train_x = np.array(train_x)

encoder = containers.Sequential([ LSTM(output_dim=5, input_dim = 2, activation='tanh' , return_sequences=True) ])
decoder = containers.Sequential([ LSTM(output_dim=2, input_dim = 5, activation='tanh', return_sequences=True) ])
autoencoder = Sequential()
autoencoder.add(AutoEncoder(encoder=encoder, decoder=decoder, output_reconstruction=False))
autoencoder.compile(loss='mean_squared_error', optimizer='sgd')
autoencoder.fit(train_x,train_x, nb_epoch=10)

It is not clear to me if the code above does what i ask for.
If i do not use return_sequences=True it yields an error.

Should i use a graph model to do exactly what i ask for?

Thank you in advance for your help.

@fchollet
Copy link
Member

fchollet commented Jan 4, 2016

What you posted does do what you figure describes.

You don't need an AutoEncoder layer to achieve this. You could simply do:

m = Sequential()
m.add(LSTM(5, input_dim=2, return_sequences=True))
m.add(LSTM(5, return_sequences=True))

Also don't train RNNs with SGD. Use RMSprop instead.

@dpappas
Copy link
Author

dpappas commented Jan 5, 2016

Thank you very much.

I assume that then i can save the weights of the 1st Lstm's final state weigths using
m.layers[0].get_weights()

@lemuriandezapada
Copy link

In my experience Adam does better than RMSprop

@dpappas
Copy link
Author

dpappas commented Jan 15, 2016

Hello again

I would like to get the outputs of the first layer of the following model


from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import containers
from keras.layers.core import Dense, AutoEncoder, TimeDistributedDense, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils
from keras.layers.recurrent import LSTM

import numpy as np

LSTM_size_1 = 5

data = [
    [ [1 ,3] ],
    [ [2 ,4] ],
    [ [3 ,5 ] ],
    [ [4 ,6 ] ],
    [ [5 ,7 ] ],
    [ [6 ,8 ] ],
    [ [7 ,9 ] ],
    [ [8 ,10 ] ],
    [ [9 ,11 ] ]
]

data = np.array(data)

in_dim = data.shape[-1]
m = Sequential()
m.add(LSTM(LSTM_size_1, input_dim=in_dim, return_sequences=True))
m.add(LSTM(in_dim, return_sequences=True))
m.add(Activation('linear'))
m.compile(loss='mse', optimizer='RMSprop')
m.fit(data,data, nb_epoch=2)

when i type

m.layers[0].get_output()

i get

DimShuffle{1,0,2}.0

Is there a way to get the final output as a numpy array instead of the weights?

I believe the output is something like this :

O_t = σ ( W_o  *  X_t  +  U_o  *  h_t-1  +  V_o  *  C_t  +  b_o )

Thank you in advance

@jgc128
Copy link

jgc128 commented Jan 15, 2016

Hi,

you can save the weights of the first LSTM, create a separate model with only one LSTM layer and set the weights of this LSTM to your saved weights. After that you can use predict method to get the output of the first LSTM.

@dpappas
Copy link
Author

dpappas commented Jan 16, 2016

Truthfully this is not what i want.

I do not want to use the trained LSTM
as an input to another Neural Net

I want to use the output of the LSTM as an embedding.

So i do not want 4 matrices ( the trained weights of the LSTM )
but 1 matrix with dimensionality n*1 where n is the number on nodes in the LSTM.

This matrix is the output of the 1st LSTM which was used as input to the
second LSTM as shown with the red arrow in the picture

@jgc128
Copy link

jgc128 commented Jan 16, 2016

If you feed the same data to this new network with one LSTM layer you will get exactly what you want as the result of the predictions. You can save these results and use it anywhere you want.

@dpappas
Copy link
Author

dpappas commented Jan 18, 2016

If there are 100 instances there will be 100 autoencoders.

I want an autoencoder to overtrain on a specific instance and extract an embedding.

Think of it as compressing all the information for a text in a vector of size 10.

I want to use these 100 embeddings as input on another network. (size: 100 X 10)

I cannot connect all LSTMs at the same time and feed the original data once more.

Neither could i connect one lstm at a time.

I just want the output of the 1st layer to numpy array.

How can i get it ?

@lemuriandezapada
Copy link

http://keras.io/faq/#how-can-i-visualize-the-output-of-an-intermediate-layer

@dpappas
Copy link
Author

dpappas commented Jan 18, 2016

Thank you.

This is what i was searching for!

@dpappas dpappas closed this as completed Jan 18, 2016
@MdAsifKhan
Copy link

@dpappas , I am also facing the same issue. I tried the above link and I get the attribute error
'LSTM' object has no attribute 'initial weights'
Could you please post the exactly snippet what you did.

@GUR9000
Copy link

GUR9000 commented Dec 24, 2016

@fchollet, using "return_sequences=True" does NOT produce what is described in the figure!
This will cause the "decoder" LSTM layer to use the output vector of the encoder at each time step instead of only the single final vector after processing the whole sequence (as shown in the figure). Or am I mistaken here?

@ypxie
Copy link

ypxie commented May 3, 2017

@GUR9000 I think you are right. At every time step, the decoder needs to take the output from last step of decoder rather than the output of the encoder.

@ScientiaEtVeritas
Copy link

I want to build a LSTM autoencoder.
My data looks like this, with shape (1200, 10, 5), which is (training_size, timesteps, input_dim):

array([[[0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1],
        ...
        [0, 1, 0, 0, 0]],
        [[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        ...
        [0, 0, 0, 0, 0]]])

Code:

from keras.layers.recurrent import LSTM
from keras.models import Sequential

model = Sequential()

encoded = LSTM(3, input_shape=(10,5))
model.add(encoded)

decoded = LSTM((10,5), return_sequences=True)
model.add(decoded)

But for the decoded step it returns ValueError: Input 0 is incompatible with layer lstm_5: expected ndim=3, found ndim=2.

Thank you in advance.

@dpappas
Copy link
Author

dpappas commented Jun 19, 2017

@ScientiaEtVeritas

Your encoded LSTM returns only the last output of the LSTM
you need to change the encoded line to

encoded = LSTM(3, input_shape=(10,5), return_sequences=True)

Finally your decoded LSTM needs a proper number of nodes for the lstm.
You give a tuple as size
You need to change this to

decoded = LSTM( 10 , return_sequences=True )

or change the number 10 to the size you want.

I suggest you read the documentation and some explanation on LSTMs
Understanding LSTM Networks

@ScientiaEtVeritas
Copy link

ScientiaEtVeritas commented Jun 19, 2017

@dpappas : Thank you for your answer.
But for clarification and as mentioned before, using return_sequences=True is not what is shown in the picture. My goal is actually a single, final vector that represents the whole sequence as good as possible. I don't think this is the case when using return_sequences=True.
There is another issue I found that actually describes why and that return_sequences=False for the encoder is the actual way to go #5138 (at the bottom).

@dpappas
Copy link
Author

dpappas commented Jun 19, 2017

@ScientiaEtVeritas

You are wright.
I have not managed to do that with keras.
You could do it with tensorflow using seq2seq.

Maybe in keras you could do it with the step funtion of the lstm or a callback function to use the output of the decoder from the previous timestep as input to the new timestep

@NeilYager
Copy link

NeilYager commented Aug 23, 2017

@dpappas @ScientiaEtVeritas

Perhaps this has the desired effect:

timesteps, input_dim = 10, 5

model.add(LSTM(3), input_shape=(timesteps, input_dim))
model.add(RepeatVector(timesteps))
model.add(LSTM(10, return_sequences=True))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants