Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: (): incompatible function arguments #102832

Open
PhdShi opened this issue Jun 2, 2023 · 7 comments
Open

TypeError: (): incompatible function arguments #102832

PhdShi opened this issue Jun 2, 2023 · 7 comments
Labels
module: cpp-extensions Related to torch.utils.cpp_extension needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@PhdShi
Copy link

PhdShi commented Jun 2, 2023

🐛 Describe the bug

Hello,I am customizing process group backends using cpp extensions according to PyTorch Tutorials,Customize Process Group Backends Using Cpp Extensions — PyTorch Tutorials 2.0.1+cu117 documentation . But an error occurred . It seems to be type error, but I find the type is right. So how can I fix it? The following is the detailed error:

File “/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py”, line 895, in init_process_group
default_pg = _new_process_group_helper(
File “/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py”, line 1029, in _new_process_group_helper
backend_class = creator_fn(backend_prefix_store, group_rank, group_size, timeout)
TypeError: (): incompatible function arguments. The following argument types are supported:
1. (arg0: c10d::Store, arg1: int, arg2: int, arg3: datetime.timedelta) → c10d::Backend

Versions

torch.version: 2.0.0+cu117

cc @malfet @zou3519

@ngimel
Copy link
Collaborator

ngimel commented Jun 2, 2023

Please provide runnable script reproducing the error

@ngimel ngimel added module: cpp-extensions Related to torch.utils.cpp_extension triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user labels Jun 2, 2023
@PhdShi
Copy link
Author

PhdShi commented Jun 5, 2023

Please provide runnable script reproducing the error

Sorry,I can't provide the third party Process Group Backend,because it involves trade secret. As for this error, I just don't understand why it is type error with the right type.

  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 895, in init_process_group
    default_pg = _new_process_group_helper(
  File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1034, in _new_process_group_helper
    backend_class = creator_fn(store, group_rank, group_size, timeout)
TypeError: (): incompatible function arguments. The following argument types are supported:
    1. (arg0: c10d::Store, arg1: int, arg2: int, arg3: datetime.timedelta) -> c10d::Backend

Invoked with: <torch.distributed.distributed_c10d.PrefixStore object at 0x7f6010f50c70>, 3, 4, datetime.timedelta(seconds=1800)

@PhdShi
Copy link
Author

PhdShi commented Jun 5, 2023

Please provide runnable script reproducing the error

I found an issue where torch installed through pip install has torch._C._GLIBCXX_USE_CXX11_ABI set to False. This seems to be causing the first parameter type error in backend_class = creator_fn(store, group_rank, group_size, timeout).
I'm not quite sure what is the purpose of the parameter torch._C._GLIBCXX_USE_CXX11_ABI

@ngimel
Copy link
Collaborator

ngimel commented Jun 5, 2023

cc @malfet

@PhdShi
Copy link
Author

PhdShi commented Jun 8, 2023

cc @malfet

I fixed it by compiling my cpp extension with string(APPEND CMAKE_CXX_FLAGS " -fabi-version=11"). Thx!

@malfet
Copy link
Contributor

malfet commented Jun 8, 2023

@PhdShi For posterity do you mind running collect_env and post its output (I suspect you are compiling using gcc-9 or newer)

In next release PyTorch will
likely require gcc-9 or newer, which has a newer cpp abi standard

@PhdShi
Copy link
Author

PhdShi commented Jun 8, 2023

@PhdShi为了后代,你介意运行 collect_env 并发布它的输出吗(我怀疑你正在使用 gcc-9 或更新版本进行编译)

在下一个版本中,PyTorch 可能 需要 gcc-9 或更新版本,它具有更新的 cpp abi 标准

In this container,I use Gcc-8, but I did run some tests on gcc-9 and the same error occurred. Finally I fixed it by compiling my cpp extension with string(APPEND CMAKE_CXX_FLAGS " -fabi-version=11") according to PyTorch/CMakelists.txt which explains Please note this is required in order to ensure compatibility between gcc 9 and gcc 7. This could be removed when all Linux PyTorch binary builds are compiled by the same toolchain again

Collecting environment information...
PyTorch version: 2.0.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: **** Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 8.3.1 20190604 
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.24

Python version: 3.8.16 (default, Mar  2 2023, 03:21:46)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.19.91-007.ali4000.alios7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cpp-extensions Related to torch.utils.cpp_extension needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants