Building with USE_TENSORPIPE=0 causes errors on import torch for MacOS #56417

rohan-varma · 2021-04-19T21:11:09Z

🐛 Bug

Building with the following command on macos:

USE_TENSORPIPE=0 USE_GLOO=1 USE_CUDNN=0 BUILD_CAFFE2_OPS=0 USE_NNPACK=0 USE_QNNPACK=0 USE_MKLDNN=0 USE_FBGEMM=0 USE_CUDA=0 USE_MKLDNN=0 USE_DISTRIBUTED=1 python setup.py develop

Results in the following error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rvarm1/Desktop/pytorch/torch/__init__.py", line 632, in <module>
    from .functional import *  # noqa: F403
  File "/Users/rvarm1/Desktop/pytorch/torch/functional.py", line 6, in <module>
    import torch.nn.functional as F
  File "/Users/rvarm1/Desktop/pytorch/torch/nn/__init__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "/Users/rvarm1/Desktop/pytorch/torch/nn/modules/__init__.py", line 2, in <module>
    from .linear import Identity, Linear, Bilinear, LazyLinear
  File "/Users/rvarm1/Desktop/pytorch/torch/nn/modules/linear.py", line 6, in <module>
    from .. import functional as F
  File "/Users/rvarm1/Desktop/pytorch/torch/nn/functional.py", line 11, in <module>
    from .._jit_internal import boolean_dispatch, _overload, BroadcastingList1, BroadcastingList2, BroadcastingList3
  File "/Users/rvarm1/Desktop/pytorch/torch/_jit_internal.py", line 21, in <module>
    import torch.distributed.rpc
  File "/Users/rvarm1/Desktop/pytorch/torch/distributed/rpc/__init__.py", line 25, in <module>
    from . import api, backend_registry, functions
  File "/Users/rvarm1/Desktop/pytorch/torch/distributed/rpc/api.py", line 39, in <module>
    from .constants import DEFAULT_SHUTDOWN_TIMEOUT, UNSET_RPC_TIMEOUT
  File "/Users/rvarm1/Desktop/pytorch/torch/distributed/rpc/constants.py", line 3, in <module>
    from torch._C._distributed_rpc import (
ImportError: cannot import name '_DEFAULT_NUM_WORKER_THREADS' from 'torch._C._distributed_rpc' (unknown location)

To Reproduce

Build pytorch on MacOS as above.

Expected behavior

Since tensorpipe is the default backend, it makes sense that if user sets USE_DISTRIBUTED=1, then USE_TENSORPIPE=1 should also be enabled, unless there are certain platforms where distributed is enabled but tensorpipe is not. In this case we should probably have errors during build time and not when importing torch.

Looking at the issue, it looks like _DEFAULT_NUM_WORKER_THREADS is only defined when #ifdef USE_TENSORPIPE is true, but in python we import this without checking if tensorpipe is available.

cc @osalpekar @jiayisuse @lw @beauby @pritamdamania87 @mrshenli @jjlilley @gqchen @rohan-varma @pietern @zhaojuanmao @satgera @aazzolini @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu

The text was updated successfully, but these errors were encountered:

lw · 2021-04-21T14:33:39Z

I think this is the same issue as #54325?

The specific issue at hand could be resolved easily (add a proper guard around _DEFAULT_NUM_WORKER_THREADS) but the core problem would remain: when USE_TENSORPIPE=0 the RPC module would be unusable (especially after 1.9 when we'll remove the ProcessGroup agent). It's unclear what to do there: make TensorPipe a mandatory dependency, or find a way to disable the entire RPC module if TensorPipe isn't available? (Imagine something like torch.distributed.rpc.is_available())

rohan-varma · 2021-04-21T18:24:47Z

@lw Yeah, looks like the same issue. Looks like @mrshenli mentioned to disable RPC if tensorpipe is not available, should we go ahead and do that?

lw · 2021-04-26T12:57:34Z

@lw Yeah, looks like the same issue. Looks like @mrshenli mentioned to disable RPC if tensorpipe is not available, should we go ahead and do that?

It's easier said than done ;)

I see there is a USE_RPC flag in our CMake files but it appears it's only there to disable RPC on Windows:

pytorch/caffe2/CMakeLists.txt

Line 1184 in b3f56ec

USE_RPC

pytorch/torch/CMakeLists.txt

Line 263 in b3f56ec

list(APPEND TORCH_PYTHON_COMPILE_DEFINITIONS USE_RPC)

Hence it's not possible for the user to control it (I believe) and it's not clear if it has been tested on Linux/OSX. Moreover I think that's only a C++ preprocessor flag, which doesn't control anything within Python, hence the .py files would still be installed, but probably something bad would happen if someone imported and tried using the RPC module. Hence we'd need to check that this works, and add a torch.distributed.rpc.is_available() function. This is not too hard, but still some work.

Another option would be to make USE_TENSORPIPE mandatory whenever USE_DISTRIBUTED is enabled.

If we want to fix the issue at hand (i.e., the missing _DEFAULT_NUM_WORKER_THREADS) it's probably best to send an ad-hoc PR.

rohan-varma added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 19, 2021

pritamdamania87 added the module: tensorpipe Related to Tensorpipe RPC Agent label Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building with USE_TENSORPIPE=0 causes errors on import torch for MacOS #56417

Building with USE_TENSORPIPE=0 causes errors on import torch for MacOS #56417

rohan-varma commented Apr 19, 2021 •

edited by pytorch-probot bot

lw commented Apr 21, 2021

rohan-varma commented Apr 21, 2021

lw commented Apr 26, 2021

Building with USE_TENSORPIPE=0 causes errors on import torch for MacOS #56417

Building with USE_TENSORPIPE=0 causes errors on import torch for MacOS #56417

Comments

rohan-varma commented Apr 19, 2021 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

lw commented Apr 21, 2021

rohan-varma commented Apr 21, 2021

lw commented Apr 26, 2021

rohan-varma commented Apr 19, 2021 •

edited by pytorch-probot bot