Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement a Residul block via keras? #1321

Closed
meanmee opened this issue Dec 21, 2015 · 29 comments
Labels

Comments

@meanmee
Copy link

@meanmee meanmee commented Dec 21, 2015

def build_residual_block_conv(num_filters, name, input_shape, input_name='x'):
"""
Rough sketch of building blocks of layers for residual learning.
See http://arxiv.org/abs/1512.03385 for motivation.
"""
block = Graph()

block.add_input(input_name, input_shape=input_shape)

h1 = Convolution2D(num_filters, 3, 3,activation='relu', border_mode='same')
block.add_node(h1, name=name+'h1', input=input_name)
block.add_node(Dropout(0.25), name=name+'d1', input=name+'h1')

h2 = Convolution2D(num_filters, 3, 3,activation='linear', border_mode='same')
block.add_node(h2, name=name+'h2', input=name+'d1')

block.add_output(name=name+'output', inputs=[name+'h1', name+'h2'], merge_mode='sum')

return block
@Sebubu

This comment has been minimized.

Copy link
Contributor

@Sebubu Sebubu commented Dec 22, 2015

I believe you have to connect the input with the output.
Anyway, I've already tried to implement it and I failed dramatically.

  • Graph has a curious issue with adding several graphs in another graph #1275
  • After adding more than 50 layers to a graph, keras gets really slow (not just the compilation). I guess add_node calculates the output_shape recursively and that kills the performance.

My code:
https://github.com/Sebubu/mushroom_crawler/blob/master/mushroom/ResidualNet.py

@meanmee

This comment has been minimized.

Copy link
Author

@meanmee meanmee commented Dec 22, 2015

Thanks very much! Do you test it on cifar-10?

@Sebubu

This comment has been minimized.

Copy link
Contributor

@Sebubu Sebubu commented Dec 22, 2015

No, like I said above it didn't work so far. Feel free to experiment with the code.

@meanmee

This comment has been minimized.

Copy link
Author

@meanmee meanmee commented Dec 22, 2015

I think the shortcut is not a Convolutional layer, but a linear layer. In your code you modified it as convolutional. Now i want to replace it with a linear layer ,what code should I modified? Sorry , I am not familiar with keras.

Oh, I think i know how to do it

@Sebubu

This comment has been minimized.

Copy link
Contributor

@Sebubu Sebubu commented Dec 22, 2015

paper proposes two options: either linear or convolution and believe me a convolution is easier. Otherwise you have to handle with reshaping and so on...

@meanmee

This comment has been minimized.

Copy link
Author

@meanmee meanmee commented Dec 22, 2015

@Sebubu @fchollet It seems like ZeroPadding2D AveragePooling2D or BatchNor can not add to a graph as a node. I modified your code, The Graph can be generated but can not work when applied to real cifar-10 data:
————————————————
The code I modified:cifar10.txt
————————————————
and my test code is:
——————————————————————
input_shapes = (3,32,32)
print('32-layers')

model = Sequential()
model.add(create_31_layer(input_shapes))
model.add(AveragePooling2D(pool_size=(8,8)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

print'Not using data augmentation or normalization'
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test), show_accuracy=True)
score = model.evaluate(X_test, Y_test, batch_size=batch_size)
print'Test score:', score
————————————————————
which raise the error:

Traceback (most recent call last):
File "/home/dell/DLTest/cifar_test/Residul/residul_32layers.py", line 43, in
model.compile(loss='categorical_crossentropy', optimizer='adam')
File "build/bdist.linux-x86_64/egg/keras/models.py", line 406, in compile
File "build/bdist.linux-x86_64/egg/keras/layers/containers.py", line 128, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 896, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 159, in get_input
.........................(many like above)
AssertionError

@Sebubu

This comment has been minimized.

Copy link
Contributor

@Sebubu Sebubu commented Dec 22, 2015

issue #1275 describes your exception. I have not found a solution till now.

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Jan 23, 2016

Is it like building a wrapper that would enable implementing residual net by adding several residual block, i.e.

mode.add(residual())
model.add(residual())

? Then it would be so cool. I think the problem with BN and graph was resolved.

@courageon

This comment has been minimized.

Copy link

@courageon courageon commented Feb 17, 2016

I implemented a residual class based on the Regularizer class. Not sure if this is the correct way to do it or not, but it seems to work. Also slows down training to a crawl.

For the life of me this thing will not format right, so sorry for the awful formatting...

_CODE START_

from keras.layers.core import Layer
from keras.regularizers import Regularizer

class ResidualRegularizer(Regularizer):
def init(self):
pass

def set_layer(self, layer):
    self.layer = layer

#When asked for the loss, just return 0 to prevent back-prop to the previous layers
def __call__(self, loss):
    return 0

def get_config(self):
    return {"name": self.__class__.__name__}

class Residual(Layer):
"""
Layer that passes through its input unchanged, and applies no back propagation.
It is simply a forward-propagated link intended for residual linking.
"""
def init(self, **kwargs):
super(Residual, self).init(**kwargs)
residual_regularizer = ResidualRegularizer()
residual_regularizer.set_layer(self)
self.regularizers = [residual_regularizer]

def get_output(self, train=False):
    return self.get_input(train)

def get_config(self):
    config = {"name": self.__class__.__name__}
    base_config = super(Residual, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))

_CODE END_

Then you implement it like...
model.add_node(Dense(1536, activation='relu'), merge_mode='concat', concat_axis=-1, name='class_dense1', inputs=['flatten_embed','flatten'])
model.add_node(Dense(1536, activation='relu'), name='class_dense2', input='class_dense1')
model.add_node(Dense(1536, activation='relu'), name='class_dense3', input='class_dense2')
model.add_node(Residual(), name='class_residual', input='class_dense1')
model.add_node(Dense(vocab_size, activation='softmax'), name='class_softmax', merge_mode='sum', inputs=['class_residual','class_dense3'])

Again sorry for the terrible formatting...

It's not so much a residual layer as it is just a way to grab a previous layers outputs without back propagating. Then you can take that output and merge it, creating your 'residual'.

Also note this method only works if all the layers are the same size.

@sergeyf

This comment has been minimized.

Copy link

@sergeyf sergeyf commented Feb 24, 2016

@courageon

This comment has been minimized.

Copy link

@courageon courageon commented Feb 25, 2016

I've seen that code but I don't understand how it's supposed to be used. It looks like it's building the entire network, so it may be more of a factory? But, maybe we can use the Identity class in that code as a replacement for the Residual class in my example. Could be worth a shot if it's any faster. I'll give it a try and report back.
Thanks sergeyf!

@sergeyf

This comment has been minimized.

Copy link

@sergeyf sergeyf commented Feb 25, 2016

I think you can just use the blocks it returns in other models. So:

s = Sequential()
s.add( build_residual_block('resblock1', (100,), 2, n_skip=2) )
s.add( build_residual_block('resblock2', (100,), 2, n_skip=2) )

etc.

@courageon

This comment has been minimized.

Copy link

@courageon courageon commented Feb 25, 2016

Ah, ok, I see it now. Thanks for the example! :)

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Mar 4, 2016

Worth to check out this comment, I think the code @sergeyf mentioned has the same mistake. ReLU() should be applied after merge I guess - according to the comment. What do you think?

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Mar 5, 2016

https://github.com/keunwoochoi/residual_block_keras
I opened a repo for my residual block implemented in Keras.
I put many comments in the files so it would be easy to understand. However I'm not sure if it's correct.

@cmishra

This comment has been minimized.

Copy link

@cmishra cmishra commented Mar 8, 2016

@keunwoochoi I think your observation is correct. Additionally, I walked myself through the logic of your code and it seems to be correct (albeit I haven't used it, that's for tomorrow). I did have a couple questions:

  • Your current implementation would only work if kernel_sizes = (3, 3) correct? I say this because of line 63.
  • Any reason you used max pooling for subsampling instead of increasing the stride? Admittedly it makes more sense to me then setting stride=2 on a (1, 1) filter, but I'm curious whether you have a source on this. Aside: I'd also be curious to see whether average pool would work better than max pooling and stride=2 since intuition would suggest that it would better resemble the original x.
  • Any reason you implement the main path in the residual block internally instead of accepting it as a parameter? This would simplify and make more robust the implementation. I see two potential thorns: automatically detecting necessary subsampling/dimension modifications, and deep copying a Keras layer. I'm fairly confident the first can be easily resolved. Not sure about the latter. Do Keras layers work with deepcopy?
@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Mar 8, 2016

Thanks, @cmishra.

  • Oh, yes, there's a mistake. It used to assume (3,3) input but now it should be (1,1) rather than (kernel_row-1)/2. I'll update it.
  • It seems a different story, but does (1,1) convolution with stride>1 really work? I'm not sure it's okay to do when the stride > kernel size. I chose MP than increasing the stride as in my case stride > kernel size for some layers.
  • Average pooling: true, I once implemented it but not tested yet. Your saying supports the idea as well, I'll try it soon after the networks seems to work.
  • main path - I think it's natural and convenient but haven't thought about separate the main path and the shortcut in different function. Wouldn't it be more complex to use?

I'll also update line58 to line81 to make it clear (the shortcut convolution part). (EDIT: I updated it)

(PS. At #1910 I added an issue for it but here people can get notifications.)

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Mar 9, 2016

Now my networks seems to start to learning something meaningful features - not sure yet though. I once added usual classifier - flatten() - maxout() - BN - maxout ().. but now I change it to an almost fully convolution architecture as in the original ResNet paper, removed all dropouts, change MaxPooling to AveragePooling in the shortcut path. Still not sure which is critical and which is not, but worth noting. I'll update more.

@cmishra

This comment has been minimized.

Copy link

@cmishra cmishra commented Mar 9, 2016

@Sebubu with #1387 does your code work? I'd try to run it myself, but I'm traveling and my personal laptop doesn't have the processing.

@keunwoochoi , let's take the discussion regarding your code to the other issue you posted.

@sergeyf

This comment has been minimized.

Copy link

@sergeyf sergeyf commented Mar 17, 2016

There's a more recent (apparently) improved version of the residual block:

http://arxiv.org/pdf/1603.05027v1.pdf

@raghakot

This comment has been minimized.

Copy link

@raghakot raghakot commented Apr 18, 2016

I recently implemented it in Keras using the new functional API: https://github.com/raghakot/keras-resnet

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Apr 18, 2016

Great, I also updated my residual network implementation with Keras 1.0 API and the author's new paper that @sergeyf mentioned: https://github.com/keunwoochoi/residual_block_keras .

@codingneo

This comment has been minimized.

Copy link

@codingneo codingneo commented Jun 13, 2016

keunwoochoi, I try to use your residual_block_keras (https://github.com/keunwoochoi/residual_block_keras), but I encountered the following error

Traceback (most recent call last):
  File "example.py", line 151, in <module>
    model = get_residual_model()
  File "example.py", line 120, in get_residual_model
    residual_blocks = design_for_residual_blocks(num_channel_input=128)
  File "example.py", line 100, in design_for_residual_blocks
    subsample=pool_sizes[conv_idx]
  File "/usr/local/lib/python2.7/site-packages/keras/models.py", line 145, in add
    output_tensor = layer(self.outputs[0])
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 485, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 543, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 148, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 1922, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2064, in run_internal_graph
    output_tensors = to_list(layer.call(computed_tensor, computed_mask))
  File "/usr/local/lib/python2.7/site-packages/keras/layers/normalization.py", line 116, in call
    raise Exception('You are attempting to share a '
Exception: You are attempting to share a same `BatchNormalization` layer across different data flows. This is not possible. You should use `mode=2` in `BatchNormalization`, which has a similar behavior but is shareable (see docs for a description of the behaviour).

Do you have any suggestion?

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Jun 13, 2016

Hi @codingneo, it was fixed yesterday, keunwoochoi/residual_block_keras@a35fe6f. Pull the repo again and give it a try!

@codingneo

This comment has been minimized.

Copy link

@codingneo codingneo commented Jun 13, 2016

Hi @keunwoochoi I actually did similar fix as your new changes by using mode=2 for BatchNormalization layer. But with mode=2 in BatchNormalization layer, the training procedure will generate NaN loss as follows

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
1088/60000 [..............................] - ETA: 13519s - loss: nan - acc: 0.1158
1152/60000 [..............................] - ETA: 13522s - loss: nan - acc: 0.1137

Is it something need to be cautious?

@keunwoochoi

This comment has been minimized.

Copy link
Contributor

@keunwoochoi keunwoochoi commented Jun 13, 2016

Is it the result of running example.py?
With Theano backend it’s working well. Had an error with TensorFlow though.

Epoch 1/20
 5760/60000 [=>............................] - ETA: 349s - loss: 0.5634 - acc: 0.8210

On 13Jun 2016, at 13:40, Yiqun Hu notifications@github.com wrote:

Hi @keunwoochoi https://github.com/keunwoochoi I actually did similar fix as your new changes by using mode=2 for BatchNormalization layer. But with mode=2 in BatchNormalization layer, the training procedure will generate NaN loss as follows

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
1088/60000 [..............................] - ETA: 13519s - loss: nan - acc: 0.1158
1152/60000 [..............................] - ETA: 13522s - loss: nan - acc: 0.1137

Is it something need to be cautious?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1321 (comment), or mute the thread https://github.com/notifications/unsubscribe/APZ8xUTLs7Gb6O6lFVrn0HE3StTVGN4sks5qLU-0gaJpZM4G5hHh.

@codingneo

This comment has been minimized.

Copy link

@codingneo codingneo commented Jun 13, 2016

@keunwoochoi Yes, the result is getting from running example.py.
I am in mac using then backend.

@stale stale bot added the stale label May 23, 2017
@stale stale bot closed this Jun 22, 2017
@Darthholi

This comment has been minimized.

Copy link

@Darthholi Darthholi commented Aug 28, 2017

For other googlers like me - there IS now a way to do residual connection in keras -
https://keras.io/getting-started/functional-api-guide/

from keras.layers import Conv2D, Input

# input tensor for a 3-channel 256x256 image
x = Input(shape=(256, 256, 3))
# 3x3 conv with 3 output channels (same as input channels)
y = Conv2D(3, (3, 3), padding='same')(x)
# this returns x + y.
z = keras.layers.add([x, y])

x,y being two consecutive layers

@MarviB16

This comment has been minimized.

Copy link

@MarviB16 MarviB16 commented Dec 8, 2018

But how would i add it to my model? (Sorry im pretty new to Keras)
model.add(z)
Doesn't work, obviously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.