Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow conflicts with nn.DataParallel #2230

Closed
lanpa opened this issue Jul 28, 2017 · 5 comments
Closed

tensorflow conflicts with nn.DataParallel #2230

lanpa opened this issue Jul 28, 2017 · 5 comments

Comments

@lanpa
Copy link
Collaborator

lanpa commented Jul 28, 2017

Environment: 2 GTX1080 GPU
minimal reproducible code:

import torch
from torch.autograd import Variable

import tensorflow as tf
with tf.device('/cpu:0'):
    emb = tf.Variable([[1,2],[3,4]], name="embedding")

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
    sess.run(emb.initializer)

model = torch.nn.Linear(128, 1).cuda()
model = torch.nn.DataParallel(model).cuda()

data = Variable(torch.Tensor(8,128)).cuda()
x = model(data)

error message:

  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 225, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 59, in forward
    replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 64, in replicate
    return replicate(module, device_ids)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
    param_copies = Broadcast(devices)(*params)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
    outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 49, in broadcast_coalesced
    raise RuntimeError('all tensors must be on devices[0]')

By removing model = torch.nn.DataParallel(model).cuda() or sess.run the code works fine.

@cdluminate
Copy link
Contributor

What about setting the environt variable CUDA_VISIBLE_DEVICES=1?

See also http://pytorch.org/docs/master/notes/cuda.html

@lanpa
Copy link
Collaborator Author

lanpa commented Jul 30, 2017

CUDA_VISIBLE_DEVICES=1 makes only one GPU do the work, which did not take DataParallel's advantage. I think the interesting thing is that other program might cause pytorch moving its GPU tensors around. (and cause strange errors?)

@apaszke
Copy link
Contributor

apaszke commented Aug 7, 2017

Can you print torch.cuda.current_device() after you run the TF initializer?

@lanpa
Copy link
Collaborator Author

lanpa commented Aug 7, 2017

oops, it moves to another GPU after TF initialize!
https://gist.github.com/anonymous/411931230de42bcecd8a9dd535c64e6b

before import tf:
0 2
after import tf:
0 2
2017-08-07 14:00:54.239520: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-07 14:00:54.239548: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-07 14:00:54.239557: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-07 14:00:54.239564: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-07 14:00:54.239571: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-07 14:00:54.341517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-07 14:00:54.341880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.797
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 5.60GiB
2017-08-07 14:00:54.446014: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x2d66880 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-08-07 14:00:54.446348: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-08-07 14:00:54.446694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.797
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 5.60GiB
2017-08-07 14:00:54.447293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 
2017-08-07 14:00:54.447318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y Y 
2017-08-07 14:00:54.447325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y Y 
2017-08-07 14:00:54.447342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
2017-08-07 14:00:54.447355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
after init:
1 2
after init (outside session):
1 2
Traceback (most recent call last):
  File "bug.py", line 24, in <module>
    x = model(data)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 60, in forward
    replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 65, in replicate
    return replicate(module, device_ids)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
    param_copies = Broadcast(devices)(*params)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 18, in forward
    outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
  File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 52, in broadcast_coalesced
    raise RuntimeError('all tensors must be on devices[0]')
RuntimeError: all tensors must be on devices[0]

@colesbury
Copy link
Member

It's too bad that Tensorflow changes the current device, but this is the expected PyTorch behavior. The model must be on device_ids[0]. device_ids defaults to 0, 1, 2, .... If you're model is on device 7, you must manually specify device_ids.

Or, just set the current device after the TensorFlow call:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
    sess.run(emb.initializer)

torch.cuda.set_device(0)  # set the device back to 0

model = torch.nn.Linear(128, 1).cuda()
model = torch.nn.DataParallel(model).cuda()

data = Variable(torch.Tensor(8,128)).cuda()
x = model(data)

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants