-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge two resnet50 causes fit_generator to crash after more than 5 minute - without start the training #5408
Comments
This is a problem related to Theano. For some reason, it isn't able to
compile some GPU code.
Can you give the full error message? There is information in it that I'm
missing.
Fred
…On Wed, Feb 15, 2017 at 11:48 AM oak-tree ***@***.***> wrote:
Hello,
I'm trying to play with keras and resnet50, I was trying to do the
following:
input_dim = (3, 224, 224)
input_a = Input(shape=input_dim)
input_b = Input(shape=input_dim)
base_model = ResNet50(weights='imagenet', include_top=False, input_tensor=None, input_shape=input_dim)
out_a_base = base_model (input_a)
out_b_base = base_model (input_b)
concatenated = merge([out_a_base,out_b_base], mode='sum')
model = Model(input=[input_a,input_b], output=distance )
This works and model.compile works as well. But when trying to do
model.fit_generator(...)
it hangs for long long time and then, *before* starts, it produces long
error message with ends with
Exception: ('The following error happened while compiling the node',
GpuElemwise{RoundHalfToEven,no_inplace}(GpuElemwise{Composite{sqrt(clip(i0,
i1, i2))},no_inplace}.0), '\n', 'nvcc return status', 2, 'for cmd', 'nvcc
-shared -O3 --maxrregcount=32 -arch=sm_37 -m64 -Xcompiler
-fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden
-Xlinker
-rpath,/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray
-I/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray
-I/usr/local/cuda-8.0/include
-I/home/oak/venv2/local/lib/python2.7/site-packages/numpy/core/include
-I/usr/include/python2.7
-I/home/oak/venv2/local/lib/python2.7/site-packages/theano/gof
-I/home/oak/venv2/local/lib/python2.7/site-packages/theano/sandbox/cuda -o
/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/tmpKFacqe/1a0cc683bdd484bffedd2637c51df231.so
mod.cu
-L/home/oak/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray
-L/usr/lib -lcudart -lcublas -lcuda_ndarray -lpython2.7',
'[GpuElemwise{RoundHalfToEven,no_inplace}(<CudaNdarrayType(float32, (False,
True, False, False))>)]')
I replaced the base model with simpler structure like:
base_model= Sequential()
base_model.add(Flatten(input_shape=input_dim))
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dropout(0.5))
base_model.add(Dense(1024, activation='relu'))
and fit_generator in this case is working just fine. What can be the
issue with ResNet50 and the merge layer
Is it memory issue?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5408>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALC-8D-6vzNJGH8iy2jbpCQnmvNHukAks5rcyxGgaJpZM4MB9UZ>
.
|
@nouiz It seems to be an issue with the RoundHalfToEven mode for rounding. I think the master branch of theano had fixed this (so I think this is a 0.8.2 issue). Another thread here. The default mode was changed in the Keras backend fairly recently to match what TF does (I believe). The prior fix was just to change the rounding method back to |
@nouiz |
@patyork looks like you are right, I installed the lastest from
and it seems to fix this issue |
Hello,
I'm trying to play with keras and resnet50, I was trying to do the following:
This works and model.compile works as well. But when trying to do
model.fit_generator(...)
it hangs for long long time and then, before starts, it produces long error message with ends with
I replaced the base model with simpler structure like:
base_model= Sequential()
base_model.add(Flatten(input_shape=input_dim))
base_model.add(Dense(1024, activation='relu'))
base_model.add(Dropout(0.5))
base_model.add(Dense(1024, activation='relu'))
and. What can be the issue withfit_generator
in this case is working just fineResNet50
and themerge layer
Is it memory issue?
EDIT:
without the
nvcc compiler
(took it out from path) . It does not crash but takes forever(alot of minutes) until it gets to the train phaseEDIT2
with the
nvcc compiler
theano backend
it crashes also for the simple module.The text was updated successfully, but these errors were encountered: