Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using batch normalization in a nested model causes an error, resulting model is unloadable #4638

Closed
3 tasks done
guicho271828 opened this issue Dec 8, 2016 · 17 comments
Closed
3 tasks done

Comments

@guicho271828
Copy link

minimal test case:
https://gist.github.com/guicho271828/5ac82dda5e5b12316ea705c7dc5a8aea

tested under tensorflow backend. with version 0.12.

[guicho 9]$ pip show tensorflow-gpu
Name: tensorflow-gpu
Version: 0.12.0rc0

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps

  • If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps

  • Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

@guicho271828
Copy link
Author

Stack trace:

Traceback (most recent call last):
  File "./minimal.py", line 19, in <module>
    y = autoencoder1(x)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 517, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 571, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 378, in call
    return self.model.call(x, mask)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2078, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2220, in run_internal_graph
    output_tensors = to_list(layer.call(computed_tensor, computed_mask))
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 128, in call
    self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
    variable, value, momentum)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
    update_delta = _zero_debias(variable, value, decay)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
    trainable=False)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 632, in _get_single_variable
    name, "".join(traceback.format_list(tb))))
ValueError: Variable batchnormalization_1_running_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
    variable, value, momentum)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 128, in call
    self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))

@fchollet
Copy link
Member

fchollet commented Dec 8, 2016

Cannot reproduce. From which are conclude that some of your bullet points are incorrectly checked.

@fchollet fchollet closed this as completed Dec 8, 2016
@guicho271828
Copy link
Author

The second bullet is correct because Im not using Theano, as obvious from the stack trace. ( "if A then B" implies "if not A then always true")

@guicho271828
Copy link
Author

conditionaltruthtable

@guicho271828
Copy link
Author

guicho271828 commented Dec 8, 2016

update log --- am I wrong? its already 1.1.2.

[guicho 10]$ pip show keras
Name: Keras
Version: 1.1.2
Summary: Deep Learning for Python
Home-page: https://github.com/fchollet/keras
Author: Francois Chollet
Author-email: francois.chollet@gmail.com
License: MIT
Location: /usr/local/lib/python2.7/dist-packages
Requires: theano, pyyaml, six
[guicho 10]$ sudo pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps

The directory '/home/guicho/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/guicho/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting git+git://github.com/fchollet/keras.git
  Cloning git://github.com/fchollet/keras.git to /tmp/pip-r0ju2l-build
Installing collected packages: Keras
  Found existing installation: Keras 1.1.2
    Uninstalling Keras-1.1.2:
      Successfully uninstalled Keras-1.1.2
  Running setup.py install for Keras ... �[?25ldone
�[?25hSuccessfully installed Keras-1.1.2
[guicho 10]$ pip show keras
Name: Keras
Version: 1.1.2
Summary: Deep Learning for Python
Home-page: https://github.com/fchollet/keras
Author: Francois Chollet
Author-email: francois.chollet@gmail.com
License: MIT
Location: /usr/local/lib/python2.7/dist-packages
Requires: theano, pyyaml, six
sudo pip show keras
Name: Keras
Version: 1.1.2
Summary: Deep Learning for Python
Home-page: https://github.com/fchollet/keras
Author: Francois Chollet
Author-email: francois.chollet@gmail.com
License: MIT
Location: /usr/local/lib/python2.7/dist-packages
Requires: theano, pyyaml, six

@guicho271828
Copy link
Author

guicho271828 commented Dec 8, 2016

Perhaps you missed line https://gist.github.com/guicho271828/5ac82dda5e5b12316ea705c7dc5a8aea#file-minimal-py-L16 ?
So, basically the code works by default. If that's so then yes that's my fault.

### uncommenting the lines below makes it stop working
### just wrap it 
# x = Input(shape=(784,))
# y = autoencoder1(x)
# autoencoder2 = Model(input=x,output=y)
# autoencoder = autoencoder2

@fchollet
Copy link
Member

fchollet commented Dec 8, 2016 via email

@guicho271828
Copy link
Author

guicho271828 commented Dec 8, 2016

The comment above verifies that the keras version is fine (keras 1.1.2 installed from github).

sudo pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps

ok, I will clone keras locally and use it without pip and test it again.

@guicho271828
Copy link
Author

I locally cloned the keras git repository and still reproduces the same error (with above code uncommented).
See below for the stack trace. Now it shows the source code points to the local directory.
I cloned the master branch 1de4fe0

[guicho 9]$ python minimal.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
  File "minimal.py", line 19, in <module>
    y = autoencoder1(x)
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/engine/topology.py", line 517, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/engine/topology.py", line 571, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/models.py", line 378, in call
    return self.model.call(x, mask)
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/engine/topology.py", line 2078, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/engine/topology.py", line 2220, in run_internal_graph
    output_tensors = to_list(layer.call(computed_tensor, computed_mask))
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/layers/normalization.py", line 128, in call
    self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
    variable, value, momentum)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
    update_delta = _zero_debias(variable, value, decay)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
    trainable=False)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 632, in _get_single_variable
    name, "".join(traceback.format_list(tb))))
ValueError: Variable batchnormalization_1_running_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "/home/guicho/repos/tensorflow-tutorial/9/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
    variable, value, momentum)
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/layers/normalization.py", line 128, in call
    self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
  File "/home/guicho/repos/tensorflow-tutorial/9/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))

@guicho271828
Copy link
Author

The error is from tensorflow. As noted, Im using tensorflow-gpu 0.12 with python 2.7.
Sorry I did not noted the python version.

@guicho271828
Copy link
Author

more info:

[guicho ~]$ nvidia-smi
Thu Dec  8 23:47:30 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 0000:01:00.0      On |                  N/A |
|  0%   43C    P8    14W / 200W |    211MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1059    G   /usr/lib/xorg/Xorg                             138MiB |
|    0      1808    G   cinnamon                                        70MiB |
+-----------------------------------------------------------------------------+
[guicho ~]$ uname -a
Linux guicho-desktop 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

@sebastian-schlecht
Copy link

sebastian-schlecht commented Dec 9, 2016

Same issue here calling the model on a tf.Tensor.

Keras is up to date, TF 0.12, Python 2.7

@sebastian-schlecht
Copy link

@guicho271828 In case this is blocking for you use
tf.get_variable_scope().reuse_variables() right before calling the model on the input tensor - that solved it for me.

@guicho271828
Copy link
Author

@sebastian-schlecht thanks.

@guicho271828
Copy link
Author

guicho271828 commented Dec 16, 2016

reuse_variables works only in a certain occasion, and I am virtually unable to save/load models including BN due to similar errors. there is clearly something wrong regarding the BN naming scheme (e.g. "batchnormalization_1_running_mean/biased")

@guicho271828
Copy link
Author

I updated the gist https://gist.github.com/guicho271828/5ac82dda5e5b12316ea705c7dc5a8aea , reuse_variables solves the issue only partially. The saved model cannot be loaded with keras.model.load_model .

@guicho271828 guicho271828 changed the title Batch normalization wrapped in a model incorrectly identified as reused Using batch normalization in a model causes an error, resulting model is unloadable Dec 16, 2016
@guicho271828 guicho271828 changed the title Using batch normalization in a model causes an error, resulting model is unloadable Using batch normalization in a nested model causes an error, resulting model is unloadable Dec 16, 2016
@guicho271828
Copy link
Author

@fchollet , I removed the latest Tensorflow and installed the older version (0.11). Then the failure case ran successfully. This is a regression due to the latest Tensorflow. I guess you have an older version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants