Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge two resnet50 causes fit_generator to crash after more than 5 minute - without start the training #5408

Closed
oak-tree opened this issue Feb 15, 2017 · 4 comments

Comments

@oak-tree
Copy link

oak-tree commented Feb 15, 2017

Hello,
I'm trying to play with keras and resnet50, I was trying to do the following:


input_dim = (3, 224, 224)
input_a = Input(shape=input_dim)
input_b = Input(shape=input_dim)

base_model = ResNet50(weights='imagenet', include_top=False, input_tensor=None, input_shape=input_dim)

out_a_base = base_model (input_a)
out_b_base = base_model (input_b)
concatenated = merge([out_a_base,out_b_base], mode='sum')
model = Model(input=[input_a,input_b], output=distance )


This works and model.compile works as well. But when trying to do

model.fit_generator(...)
it hangs for long long time and then, before starts, it produces long error message with ends with

Exception: ('The following error happened while compiling the node', GpuElemwise{RoundHalfToEven,no_inplace}(GpuElemwise{Composite{sqrt(clip(i0, i1, i2))},no_inplace}.0), '\n', 'nvcc return status', 2, 'for cmd', 'nvcc -shared -O3 --maxrregcount=32 -arch=sm_37 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -I/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -I/usr/local/cuda-8.0/include -I/home/oak/venv2/local/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -I/home/oak/venv2/local/lib/python2.7/site-packages/theano/gof -I/home/oak/venv2/local/lib/python2.7/site-packages/theano/sandbox/cuda -o /home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmpKFacqe/1a0cc683bdd484bffedd2637c51df231.so mod.cu -L/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray -L/usr/lib -lcudart -lcublas -lcuda_ndarray -lpython2.7', '[GpuElemwise{RoundHalfToEven,no_inplace}(<CudaNdarrayType(float32, (False, True, False, False))>)]')

I replaced the base model with simpler structure like:
base_model= Sequential()
base_model.add(Flatten(input_shape=input_dim))
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dropout(0.5))
base_model.add(Dense(1024, activation='relu'))

and fit_generator in this case is working just fine. What can be the issue with ResNet50 and the merge layer

Is it memory issue?

EDIT:

without the nvcc compiler (took it out from path) . It does not crash but takes forever(alot of minutes) until it gets to the train phase

EDIT2
with the nvcc compiler theano backend it crashes also for the simple module.

@nouiz
Copy link
Contributor

nouiz commented Feb 15, 2017 via email

@patyork
Copy link
Contributor

patyork commented Feb 15, 2017

@nouiz It seems to be an issue with the RoundHalfToEven mode for rounding. I think the master branch of theano had fixed this (so I think this is a 0.8.2 issue). Another thread here.

The default mode was changed in the Keras backend fairly recently to match what TF does (I believe). The prior fix was just to change the rounding method back to half_away_from_zero.

@oak-tree
Copy link
Author

@nouiz
Here is a gist for the full error lloghttps://gist.github.com/oak-tree/ccec4bf5ec0931c29a11629e1a0f9d46

@oak-tree
Copy link
Author

@patyork looks like you are right, I installed the lastest from Theano repo by

pip install --upgrade git+https://github.com/Theano/Theano.git#egg=Theano

and it seems to fix this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants