-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi_gpu_model not working w/ TensorFlow 1.14 #13057
Comments
The same with you. The codes can run with tf==1.12.0, but cannot run with tf=1.14.0. I don't know the reason. The largest change is that I have transferred CUDA from 9.0 to 10.0, then nothing has been changed. |
@QtacierP It works with TF 1.13 and CUDA 10.0 for me, its just TF 1.14 that's a problem |
I have the same problem now. Currently, is downgrade an only way to solve this problem? |
@derekhsu I can't really speak for Keras maintainers, but I don't know of any other solution. Bugs like this with critical features like Multi-GPU training are a big problem for Keras. |
Same here!!!! very disappointed solutions, please..... |
I received the same error message as above when using tf1.14, but after downgrading to 1.12 as well as to 1.13 I am confronted with:
Any suggestions whether this is caused by the same issue or if I might have another problem? System information
|
Same problem here. Log:
|
I am trying to learn Pytorch.... little by little.... I dont kow when they will fix this.... it have been three months |
same problem here too... AttributeError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/keras/utils/multi_gpu_utils.py in multi_gpu_model(model, gpus, cpu_merge, cpu_relocation) /opt/conda/lib/python3.7/site-packages/keras/engine/base_layer.py in call(self, inputs, **kwargs) /opt/conda/lib/python3.7/site-packages/keras/engine/network.py in call(self, inputs, mask) /opt/conda/lib/python3.7/site-packages/keras/engine/network.py in run_internal_graph(self, inputs, masks) /opt/conda/lib/python3.7/site-packages/keras/layers/normalization.py in call(self, inputs, training) /opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in normalize_batch_in_training(x, gamma, beta, reduction_axes, epsilon) /opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in _has_nchw_support() /opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in _is_current_explicit_device(device_type) /opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in _get_current_tf_device() /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in _apply_device_functions(self, op) AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string' |
yes, TF 1.14 issue, see tensorflow/tensorflow#30728 |
i found a workaround (which worked at least for my configuration):
might not be the optimal solution, but it finally allowed me to use all of my GPUs... |
@ju-he thanks for the post, but I cant find that line of code, I dont have anything in line 150, just some comments.... I have this if isinstance(spec, MergeDevice): but never found parse_from_string in the file I don't understand what did you changed... Thanks. |
@TheStoneMX which tf version are you using? I downgraded from 1.14 to 1.12 and to 1.10, since some suggested this solution but I still had the issues as described above. Maybe it's due to my specific configuration. But do I understand you correctly, that the "if isinstance-return spec" part (that's the only thing I added) is already there in your version? Then apparently this bug has already been fixed, just not in the version I was using. |
@ju-he I am using 1.14 and still can't use multiple GPUs, I am thinking to learn Pytorch..... it has been too long and they are not fixing this bug |
@TheStoneMX have you tried switching to tf 1.12 or even 1.10? This together with the Bugfix I posted above should work fine. |
Hi @ju-he Thanks for the email, but I havent be able, I am using conda and everytim I install keras, it reinstall 1.4.... Do you know how I can do it ? But I thought that 1.3 does not have this problem, because before everything was working. |
Hi @TheStoneMX |
I was not using conda for a while then I started to use it again, I am switching to see if I can make work, thanks bro. |
Hi, same issue here when trying to use multi GPU with Keras. |
Hi @ju-he I got it working removing anaconda and using pip3 and installed TensorFlow-GPU 1.13.2 |
I have the same problem when calling: Tensorflow-gpu 1.14 has disappointed me as well. I consider 1.13.2 a last reliable version.
I believe 1.14 is currently more similar to TensorFlow 2 rather than to TensorFlow 1. Please consider backwards compatibility for 1.x.x versions if the version still starts with 1. |
I, too, got it working by installing a pip3 environment separate from my Anaconda environment.
|
Just upgrade tf version to 1.15, works for me. |
Error is triggered for me on Tensorflow-GPU 1.15 with Keras 2.2.4 |
Tested on my computer.
|
keras 2.2.4 |
System information
Describe the current behavior
I am using the cifar-10 ResNet example from the Keras examples directory, with the addition of the following line at Line number 360 (just before compilation) in order to use multiple GPUs while training. However this doesn't work.
Line Added:
model = keras.utils.multi_gpu_model(model, gpus=2)
Traceback Error log:
Describe the expected behavior
Previously, this typically worked fine and results in faster training due to parallelization across GPUs.
Note: This works fine if the backend is Tensorflow 1.13, so this is a regression.
The text was updated successfully, but these errors were encountered: