Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy way to combine CNN + LSTM? (e.g. LRCN network) #401

Closed
anayebi opened this issue Jul 16, 2015 · 27 comments
Closed

Easy way to combine CNN + LSTM? (e.g. LRCN network) #401

anayebi opened this issue Jul 16, 2015 · 27 comments

Comments

@anayebi
Copy link
Contributor

anayebi commented Jul 16, 2015

I was wondering if there was a straightforward way in Keras (or would I have to write my own layer?) to combine a convolutional network which extracts features and then feeds it to an LSTM (or GRU, MUT1, etc) network (similar to Figure 1 of this paper: http://arxiv.org/pdf/1411.4389v3.pdf)?

Specifically, I want the input i_t to the convolutional network at a given timestep t to consist of n frames (in the case of Figure 1, n = 1), so i_t would be of dimension (num_rows, num_cols, n), from which the features of i_t are extracted and fed into an LSTM network, which produces a prediction y_t and a hidden state h_t. Then the next input i_{t+1} of dimension (num_rows, num_cols, n) is fed into the same convolutional network which outputs the features of i_{t+1} to the LSTM layer at timestep t+1, and h_t is fed to the LSTM layer at timestep t+1 from the LSTM layer at timestep t, from which the prediction y_{t+1} and hidden state h_{t+1} are produced, and so on.

I'm aware of #129; however, in this case, I believe the original poster wanted it so that the convolutional layer does not accept new inputs across timesteps (so something like Figure 3, pg. 4 of this paper: http://arxiv.org/pdf/1411.4555v2.pdf), which is not what I want.

Thanks in advance!

@anayebi anayebi changed the title Easy way to combine CNN + LSTM? Easy way to combine CNN + LSTM? (e.g. LRCN network) Jul 16, 2015
@simonhughes22
Copy link

@anayebi read the documentation on the built in reshape layer. That should give you everything you need. Don't flatten the CNN outputs, use reshape instead.

@loyeamen
Copy link

Add these two methods to Sequential object:

def Conv2LSTM(self, num_filters, input_width):
self.add(Permute((0, 3, 2, 1)))
self.add(Reshape(input_width, num_filters))

def LSTM2ConvLayer(self):
self.add(Permute((0, 2, 'x', 1)))

Add this class to layers:

class Permute(Layer):
'''
Permute the dimensions of the data according to the given tuple
'''
def init(self, dims):
super(Permute,self).init()
self.dims = dims

def get_output(self, train):
    X = self.get_input(train)
    return X.dimshuffle(self.dims)

def get_config(self):
    return {"name":self.__class__.__name__,
        "dims":self.dims}

@afsanehghasemi
Copy link

Hi,
I am going to train CNN + LSTM, however, I was unable to exactly determine input of LSTM.
I would be appreciated if you could help.
I have sequence of frames and I am going to map them to a sequence of predefined labels.(Seq 2 Seq mapping)
My input of network is 100 sequences of 16 consecutive frames, and each frame is 28* 28.
My problem is how to define permute and reshape to connect the output layer of convolution layer to LSTM.

n_hidden = 256
n_samples = 100
n_timesteps = 16

model = Sequential()
model.add(Convolution2D(32, 5, 5, border_mode='same', input_shape=(1, 28, 28)))
model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Activation('relu'))
model.add(Permute((0, 3, 2, 1)))
model.add(Reshape(?))
model.add(LSTM(256))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
rmsprop = RMSprop(lr=learning_rate)
model.compile(loss='categorical_crossentropy', optimizer=rmsprop)

@anayebi
Copy link
Contributor Author

anayebi commented Dec 18, 2015

Hi Afsaneh,

In order to have the CNN layers to interact with the LSTM layer, they need to be distributed across time. I have made time distributed versions of Convolution2D, MaxPooling2D, and Flatten so that they can work with the LSTM layer. They can be found on my Github repo here: https://github.com/anayebi/keras-extra

So, as an example, you could do what you propose above as follows (untested code):

from keras.layers.extra import TimeDistributedConvolution2D, TimeDistributedMaxPooling2D, TimeDistributedFlatten

n_hidden = 256
n_samples = 100
n_timesteps = 16

model = Sequential()
model.add(TimeDistributedConvolution2D(32, 5, 5, border_mode='same', input_shape=(n_timesteps, 28, 28)))
model.add(TimeDistributedMaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Activation('relu'))
model.add(TimeDistributedFlatten())
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

@OnlySang
Copy link

@anayebi some bug here. In theano_backend.p

@afsanehghasemi
Copy link

Hi,
Here is an Error, I have got in theano_backend.py,
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 55, in placeholder
raise Exception('ndim too large: ' + str(ndim))

@AntreasAntoniou
Copy link

Hello there, I get this error:
Exception: Invalid input shape - Layer expects input ndim=5, was provided with input shape (None, 30, 256, 256)

when I use this architecture:

model = Sequential()
model.add(TimeDistributedConvolution2D(64, 3, 3, border_mode='same', input_shape=(30, 256, 256)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(256, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(256, 3, 3))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedFlatten())
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=True))
model.add(TimeDistributedDense(600))
model.add(Activation('softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

Am I using the package wrong? Or is there something I need to implement somewhere? Thanks for taking the time to read this

@anayebi
Copy link
Contributor Author

anayebi commented Dec 28, 2015

@AntreasAntoniou Yes, the package expects an input that is 5D: (num_samples, num_timesteps, channels, rows, cols). Your input is missing the extra num_timesteps dimension. If you experience any more issues, feel free to post to the issues page on my repo: https://github.com/anayebi/keras-extra

@OnlySang @afsanehghasemi At the moment, these layers don't work with the current version of Keras (which uses multiple backends). The errors you point out have to do with theano_backend.py, which does not allow input dimensions larger than 4. If you use a version of Keras from September or October (before the new update), then the layers as they currently are should work. On the other hand, if you don't want to use a slightly earlier Keras version, I am planning on releasing an updated version of the layers soon that should work with the newest Keras version. I will update this thread when I do release the update.

@AntreasAntoniou
Copy link

Yes, I did realise that, I changed the input to (30, 1, 256, 256). Thanks for your prompt support here. I will be waiting for your update.

@anayebi
Copy link
Contributor Author

anayebi commented Jan 4, 2016

I have pushed an updated version of my code that works with the newest
version of Keras (this includes modifying theano_backend.py to include
support for 5D tensors). For any issues/bugs, feel free to let me know
(preferably on the issues page on my repo,
https://github.com/anayebi/keras-extra, rather than on this thread :)

@fchollet On a related note, could these layers be considered for inclusion
in Keras?

It's somewhat impractical to keep up with the changes in Keras as a
separate repo as it is constantly changing. I know @fchollet had plans for
a general time distributed layer, though I think it's been on the backlog
for a while. However, if making a general time distributed layer is too
much work or is taking too much time, and if TimeDistributedConvolution2D,
TimeDistributedPooling2D, and TimeDistributedFlatten seem to be something
that could be useful to Keras users (especially those training CNN-RNN
nets), then they (or a subset thereof) may be worth considering for
inclusion (in fact, TimeDistributedDense and TimeDistributedMerge
are already specific time distributed layers).

It could be best to put all the time distributed layers in one place, to be
used in conjunction with RNNs. Or, even better yet, for Flatten, Convolution2D, and Pooling2D, we can have a flag (say, td) such that if td is set to True, the layers do the appropriate operations to be TimeDistributed. But I'll leave the API decisions to others.

On Tuesday, December 22, 2015, afsanehghasemi notifications@github.com
wrote:

Hi,
Here is an Error, I have got in theano_backend.py,
File
"/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py",
line 55, in placeholder
raise Exception('ndim too large: ' + str(ndim))


Reply to this email directly or view it on GitHub
#401 (comment).

@AntreasAntoniou
Copy link

Thanks so much man. I will go try it out now.

@Shadi94
Copy link

Shadi94 commented Feb 9, 2016

Hello,

I am using "keras-extra" in order to have CNN layers to interact with the LSTM layer. But when I run the code this error occurs "TypeError: rnn() got an unexpected keyword argument 'mask' ". Would you please help me on this issue?
Thank you very much in advance,

@AntreasAntoniou
Copy link

@Shadi94 Could you build a github gist with your code?

@XuesongYang
Copy link

#1623 provides a Convolutional3D layer in order to consider the time series as a dimension. I guess this combination of CNN+LSTM can be directly implemented by stacking LSTM layer on top of CNN.

@dakshvar22
Copy link

Hi @afsanehghasemi , Were you able to solve the issue? If yes, can you explain?

@gtyopal
Copy link

gtyopal commented Jun 26, 2016

@Shadi94 I also have the same errors: "TypeError: rnn() got an unexpected keyword argument 'mask' ", would you please update your fix? Thanks very much

@mangolzy
Copy link

i've got a similar task.. and i think they develops a layer called permute or reshape to help with.. but is there some sample codes? @afsanehghasemi have you figure it out? thanks

@DanlanChen
Copy link

Is there possibility to use stateful in lstm?
I am having the trouble using it

@anirudhgupta22
Copy link

Hi,

I am trying to work with CNN+LSTM and facing problem in using LSTM after CNN. From the last ConvLayer of the network i am getting the shape 32x8x26, how can i use LSTM after this?

Thanks.

@rolai
Copy link

rolai commented Apr 19, 2017

@anirudhgupta22
I have the same problem, do you get the answer?

@kgruhler
Copy link

kgruhler commented Nov 6, 2017

I recently opened a new issue about it (See: Here 8268) but without an answer. Is it the same that you want?

@usmanatif
Copy link

usmanatif commented Dec 6, 2017

@anayebi hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

below is my code

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)

but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

@HTLife
Copy link

HTLife commented Feb 8, 2018

@usmanatif I also ran into this error, and can’t figure out ways to feed data into cnn+LSTM by generator. Did you solve this problem?

@dksakkos
Copy link

dksakkos commented Jun 5, 2018

@usmanatif @HTLife Did you try to permute the input dimensions to (1, 3, 224, 224, 5)?

@anish9
Copy link

anish9 commented Feb 6, 2019

Hi Afsaneh,

In order to have the CNN layers to interact with the LSTM layer, they need to be distributed across time. I have made time distributed versions of Convolution2D, MaxPooling2D, and Flatten so that they can work with the LSTM layer. They can be found on my Github repo here: https://github.com/anayebi/keras-extra

So, as an example, you could do what you propose above as follows (untested code):

from keras.layers.extra import TimeDistributedConvolution2D, TimeDistributedMaxPooling2D, TimeDistributedFlatten

n_hidden = 256
n_samples = 100
n_timesteps = 16

model = Sequential()
model.add(TimeDistributedConvolution2D(32, 5, 5, border_mode='same', input_shape=(n_timesteps, 28, 28)))
model.add(TimeDistributedMaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Activation('relu'))
model.add(TimeDistributedFlatten())
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

@imairish
Copy link

imairish commented Oct 4, 2019

@anayebi hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

below is my code

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)

but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

have u solved the issue of supplying the input using keras image generator?
My model is compiling fine but their is an dimension issue in fit_generator

how u solved your problem?

Any help would be appreciated
Thank you!

@Cloud-Dark
Copy link

Cloud-Dark commented Jan 3, 2024

in my methode, but i don't know this best method or not

  1. build model with lstm
  2. build model with CNN
  3. predict lstm & CNN use MLP Regressor
import numpy as np
from sklearn.neural_network import MLPRegressor


# combine predict LSTM, CNN
all_predictions = np.concatenate((lstm_predictions, cnn_predictions), axis=1)

mlp_regressor = MLPRegressor()

mlp_regressor.fit(all_predictions, y_test)

mlp_predictions = mlp_regressor.predict(all_predictions)

print(mlp_predictions)

mse = mean_squared_error(y_test_cnn, mlp_predictions)
print(mse)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests