Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behaviour of Keras 2.0.9 and 2.0.8 #8353

Closed
mdoulaty opened this issue Nov 2, 2017 · 9 comments
Closed

Different behaviour of Keras 2.0.9 and 2.0.8 #8353

mdoulaty opened this issue Nov 2, 2017 · 9 comments

Comments

@mdoulaty
Copy link

mdoulaty commented Nov 2, 2017

With PIP installation of Keras, when using 2.0.9 (latest as of now), when importing Keras with tf backend, it allocates all available GPU resources immediately after import - however, in 2.0.8, the GPU allocation was not happening immediately after importing Keras. Is this an expected behaviour in 2.0.9?

@yanpanlau
Copy link

I am seeing the same behavior. The following code works for 2.0.8 but not 2.0.9

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
from keras import backend as K
K.set_session(sess)

@AndrasEros
Copy link

Maybe this was implemented:
#8311

@mark86092
Copy link

mark86092 commented Nov 2, 2017

I encounter the same problem

Code here https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L50

_LOCAL_DEVICES = device_lib.list_local_devices()

attempts to allocates the GPU resources. Not sure it is an expected behavior

@datumbox
Copy link
Contributor

datumbox commented Nov 3, 2017

This is a known issue. The specific method call re-registers all the GPUs/resources instead of just counting the number of available devices. I intend to send a patch during the weekend.

@datumbox
Copy link
Contributor

datumbox commented Nov 3, 2017

The problem is that the device_lib.list_local_devices() initialises a TF session and registers all available GPUs on the system. Judging from the name of the function I believe this is a bug on Tensorflow. I don't see why listing the devices requires registering them in a session.

Reproducing the problem is tricky as you need also more than 1 GPU. Here is a pure TF example which shows the problem:

$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> 
>>> tf.__version__
'1.4.0'
>>> 
>>> config = tf.ConfigProto()
>>> config.gpu_options.per_process_gpu_memory_fraction = 0.9
>>> config.gpu_options.visible_device_list = str('1')
>>> sess = tf.Session(config=config)
2017-11-03 13:02:14.730453: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-03 13:02:14.966925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:81:00.0
totalMemory: 3.95GiB freeMemory: 3.91GiB
2017-11-03 13:02:14.967000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)

As we see it has registered only GPU1. The results can be confirmed using nvidia-smi:

$ nvidia-smi 
Fri Nov  3 13:03:05 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:03:00.0  On |                  N/A |
| 42%   51C    P0     2W /  39W |    613MiB /  4040MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 00000000:81:00.0 Off |                  N/A |
| 42%   41C    P8     1W /  39W |   3676MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1397      G   /usr/bin/X                                   399MiB |
|    0      2772      G   compiz                                       200MiB |
|    1     16788      C   python                                      3664MiB |
+-----------------------------------------------------------------------------+

On the same Python shell let's call the method to get the list of available GPUs:

>>> from tensorflow.python.client import device_lib
>>> device_lib.list_local_devices()
2017-11-03 13:04:08.611074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Quadro K2200 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:03:00.0
totalMemory: 3.95GiB freeMemory: 3.31GiB
2017-11-03 13:04:08.611198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-11-03 13:04:08.611248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2017-11-03 13:04:08.611263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y N 
2017-11-03 13:04:08.611275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   N Y 
2017-11-03 13:04:08.611293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0)
2017-11-03 13:04:08.611315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13475891616555218543
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3240755200
locality {
  bus_id: 1
}
incarnation: 15732487042202847083
physical_device_desc: "device: 0, name: Quadro K2200, pci bus id: 0000:03:00.0, compute capability: 5.0"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 147456000
locality {
  bus_id: 2
}
incarnation: 7726238831769587034
physical_device_desc: "device: 1, name: Quadro K2200, pci bus id: 0000:81:00.0, compute capability: 5.0"
]

Ooops! It just registered both GPUs! Let's confirm with nvidia-smi:

$ nvidia-smi 
Fri Nov  3 13:04:28 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:03:00.0  On |                  N/A |
| 42%   51C    P0     4W /  39W |   3740MiB /  4040MiB |     14%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2200        Off  | 00000000:81:00.0 Off |                  N/A |
| 42%   42C    P8     1W /  39W |   3676MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1397      G   /usr/bin/X                                   410MiB |
|    0      2772      G   compiz                                       200MiB |
|    0     16788      C   python                                      3116MiB |
|    1     16788      C   python                                      3664MiB |
+-----------------------------------------------------------------------------+

As we see the process has acquired also the GPU0 and it's using all the available resources.

@alxy
Copy link

alxy commented Nov 3, 2017

To temporarily solve this problem, you can make only specific GPUs visible before placing any keras import:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

from keras.engine import Model

Like this, keras will only use the GPU with ID 1.

@alsrgv
Copy link
Contributor

alsrgv commented Nov 5, 2017

This affects Horovod as well. Unfortunately, CUDA_VISIBLE_DEVICES workaround is not desirable, as it prevents NCCL from doing CUDA IPC.

@fchollet
Copy link
Member

fchollet commented Nov 5, 2017

Please take a look at the outstanding fix: #8377

@wt-huang
Copy link

Closing as this is resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants