New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demogen: loading resnet models fails on GPU #39
Comments
An example which reproduce those failures is available at my fork:
|
Are the problems with resnet's only? |
Yes, nin models loads correctly.
Moreover, there are 5 resnet models which do load...
…On Tue, Aug 6, 2019, 23:58 YiDing Jiang ***@***.***> wrote:
Hi, are the problems with resnet's only?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#39?email_source=notifications&email_token=AAFM4ZHTZPKMJINLETYDPELQDHQYXA5CNFSM4IIFIF22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3WOJMQ#issuecomment-518841522>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFM4ZFNSQZHSZARXTCVPGTQDHQYXANCNFSM4IIFIF2Q>
.
|
I couldn't find |
I've added the missing file and added the line you have suggested and it
does solve part of the problem, I will check it more systematically and
will share the results.
…On Wed, Aug 7, 2019 at 2:28 AM YiDing Jiang ***@***.***> wrote:
I couldn't find extract_layers_util in your repo, but one quick thing to
try: can you try to call tf.reset_default_graph between loading different
models?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#39?email_source=notifications&email_token=AAFM4ZDMUC2GVYFRIDLANDDQDICI5A5CNFSM4IIFIF22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3WXYTQ#issuecomment-518880334>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFM4ZHK64RVNDXRNK6FTMLQDICI5ANCNFSM4IIFIF2Q>
.
--
Uri Cohen
PhD candidate for computational neuroscience
Edmond and Lily Safra Center for Brain Sciences
Hebrew University of Jerusalem, Israel
|
This indeed solve the issue for all batchnorm models in resnet, but not for groupnorm! The following error are no longer there:
The following error is still there, in all groupnorm models:
That is, for resnet I could read 108 / 216 cifar10 models and 162 / 324 of cifar100 models. |
I think the issue is that in the original code the tensor shapes are initialized as [c] and reshaped to [1, c, 1,1] but it was changed later to initializing the tensorshape with [1, c, 1, 1] directly. My bad that I didn't catch it. It might take a me bit of time to push the change, but if you do the following it should fix the issue:
to
|
Works. Thanks for the prompt response!
…On Wed, Aug 7, 2019 at 11:40 PM YiDing Jiang ***@***.***> wrote:
I think the issue is that in the internal code the tensor shapes are
initialized as [c] and reshaped to [1, c, 1,1] but it was changed later to
initializing the tensorshape with [1, c, 1, 1] directly. My bad that I
didn't catch it. It might take a me bit of time to push the change, but if
you do the following it should fix the issue:
1. Go to `models/resent.py'
2. Go to the function group_norm
3. Change:
gamma = tf.get_variable('gamma', [1, c, 1, 1],
initializer=tf.constant_initializer(1.0))
beta = tf.get_variable('beta', [1, c, 1, 1],
initializer=tf.constant_initializer(0.0))
to
gamma = tf.get_variable('gamma', [c],
initializer=tf.constant_initializer(1.0))
beta = tf.get_variable('beta', [c],
initializer=tf.constant_initializer(0.0))
gamma = tf.reshape(gamma, [1, c, 1, 1])
beta = tf.reshape(beta, [1, c, 1, 1])
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#39?email_source=notifications&email_token=AAFM4ZBXQAIVKAVKN7GFKTLQDMXNBA5CNFSM4IIFIF22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ZUTCA#issuecomment-519260552>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFM4ZBYCZZK4BD3DRDROSLQDMXNBANCNFSM4IIFIF2Q>
.
--
Uri Cohen
PhD candidate for computational neuroscience
Edmond and Lily Safra Center for Brain Sciences
Hebrew University of Jerusalem, Israel
|
The problem described in previous issue is resolved when working with tensorflow with enabled GPU support, but then there is a zoo of behaviors:
Not found: Key resnet/group_norm/beta not found in checkpoint
Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,32,1,1] rhs shape= [32]
ValueError: Trying to share variable resnet/conv2d/kernel, but specified shape (3, 3, 3, 32) and found shape (3, 3, 3, 16).
Correctly loaded:
Not found:
Invalid argument:
ValueError logs:
The text was updated successfully, but these errors were encountered: