Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Extension Bug for Pytorch 2.2.0 #118842

Open
ghostplant opened this issue Feb 1, 2024 · 21 comments
Open

CUDA Extension Bug for Pytorch 2.2.0 #118842

ghostplant opened this issue Feb 1, 2024 · 21 comments
Assignees
Labels
high priority module: cpp-extensions Related to torch.utils.cpp_extension module: cuda Related to torch.cuda, and CUDA support in general oncall: releng In support of CI and Release Engineering topic: binaries triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@ghostplant
Copy link

ghostplant commented Feb 1, 2024

馃悰 Describe the bug

Seems like the released Pytorch 2.2.0 for CUDA Linux has bugs to run C++ extensions. However, Pytorch 2.2.0 for CPU is not affected by this. So looks like this is a "2.2.0 + CUDA-only" bug.

To reproduce:

$ python3.8 -m pip install torch==2.2.0 --index-url https://download.pytorch.org/whl/cu118

$ python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main

$ python3.8 -m tutel.examples.helloworld --batch_size=16
terminate called after throwing an instance of 'c10::Error'
  what():  !dispatch_key_.has_value() INTERNAL ASSERT FAILED at "../aten/src/ATen/core/library.cpp":82, please report a bug to PyTorch. (Error occurred while processing TORCH_LIBRARY block at ./tutel/custom/custom_kernel.cpp:891)
Exception raised from Library at ../aten/src/ATen/core/library.cpp:82 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7ff953aded87 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7ff953a8f75f in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&) + 0x3f (0x7ff953adc8bf in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #3: torch::Library::Library(torch::Library::Kind, std::string, std::optional<c10::DispatchKey>, char const*, unsigned int) + 0x96c (0x7ff98b9a71bc in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x189c6 (0x7ff8e0dd29c6 in /usr/local/lib/python3.8/dist-packages/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x108d3 (0x7ff9a58348d3 in /lib64/ld-linux-x86-64.so.2)
frame #6: <unknown function> + 0x1539f (0x7ff9a583939f in /lib64/ld-linux-x86-64.so.2)
frame #7: _dl_catch_exception + 0x6f (0x7ff9a559a16f in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x1496a (0x7ff9a583896a in /lib64/ld-linux-x86-64.so.2)
frame #9: <unknown function> + 0xf96 (0x7ff9a5010f96 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #10: _dl_catch_exception + 0x6f (0x7ff9a559a16f in /lib/x86_64-linux-gnu/libc.so.6)
frame #11: _dl_catch_error + 0x2f (0x7ff9a559a1ff in /lib/x86_64-linux-gnu/libc.so.6)
frame #12: <unknown function> + 0x1745 (0x7ff9a5011745 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #13: dlopen + 0x71 (0x7ff9a5011051 in /lib/x86_64-linux-gnu/libdl.so.2)
<omitting python frames>
frame #16: python3() [0x664d58]
frame #17: python3() [0x5c08c9]
frame #32: python3() [0x5ee6f0]
frame #37: python3() [0x4c6e27]
frame #38: python3() [0x5c08c9]
frame #51: python3() [0x5ee6f0]
frame #56: python3() [0x4c6e27]
frame #57: python3() [0x5c08c9]

If I turn to use Pytorch 2.0.0 or 2.1.0 with python3 -m pip install https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp38-cp38-linux_x86_64.whl which is an all-in-one package, everything is fine.

BTW, I also tried other C++ Extensions, Pytorch 2.2.0 for CUDA always produces the same error.

Versions

Verified that it is not related to Python version. This bug happens in both Python3.8 and Python3.12.

cc @ezyang @gchanan @zou3519 @kadeng @malfet @ptrblck

@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2024

Has tutel been rebuilt for PyTorch 2.2? I would expect a rebuild to be needed

@ezyang ezyang added module: cpp-extensions Related to torch.utils.cpp_extension triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module topic: binaries module: cuda Related to torch.cuda, and CUDA support in general oncall: releng In support of CI and Release Engineering labels Feb 1, 2024
@ghostplant
Copy link
Author

Has tutel been rebuilt for PyTorch 2.2? I would expect a rebuild to be needed

Yes, it is a fresh installation. I verified it is not related to Tutel. All CUDAExtension will fail to register to Pytorch registry.

(1) It is not related to Python version (e.g. Python3.8 - 3.12);
(2) It is not related to non-CUDAExtension (e.g. CPUExtention);
(3) It is not related any high-level applications;
(4) Pytorch 2.0.0 / 2.1.0 doesn't have this problem.

@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2024

Do you have a repro that involves directly loading a cpp extension from the Python API (e.g., passing in c++ source code?)

@ghostplant
Copy link
Author

For Tutel, you can disable these lines to install extension using cpp only instead of CUDA.

@ezyang
Copy link
Contributor

ezyang commented Feb 7, 2024

When I build and install tutel with a build from source of pytorch, it works. So it's either an environment problem or a binaries specific problem.

@aitor-martinez-seras
Copy link

I have encountered same issue (in python 3.10), but in my case after installing Detectron2 and using it. Thought it was on them the error. Thanks for the temporal solution @ghostplant (downgrading to previous 2.x versions).

@ghostplant
Copy link
Author

ghostplant commented Feb 10, 2024

@ezyang Can you try Ubuntu 18.04 + Python:any + Pytorch 2.2.0? Seems like Ubuntu 18.04 is always reproducible. The package below is the last version that works with CUDA extension on Ubunu 18.04:

pip3.8 install torch==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118

@ghostplant
Copy link
Author

@ezyang Seems like Ubuntu 18.04 is the root cause making Pytorch CUDA extension no longer work since 2.2.0, while a lot of machines stick to 18.04 environment. The problem can be reproduced with this environment:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu18.04
...
RUN apt install python3-pip python3.8 -y
...

Next, this one works using 2.1.0:

python3.8 -m pip install install torch==2.1.0
python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main
python3.8 -m tutel.examples.helloworld --batch_size=16

This setting doesn't work using 2.2.0:

python3.8 -m pip install install torch==2.2.0
python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main
python3.8 -m tutel.examples.helloworld --batch_size=16

When disabling CUDA extension, this setting still works using 2.2.0:

python3.8 -m pip install install torch==2.2.0
NO_CUDA=1 python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main
python3.8 -m tutel.examples.helloworld --batch_size=16 --device cpu

@ezyang
Copy link
Contributor

ezyang commented Feb 19, 2024

@malfet is this the gcc upgrade thing? Sounds like the gcc upgrade thing

@jiweibo
Copy link

jiweibo commented Mar 1, 2024

I encountered the same problem, using CUDAExtension on pytorch 2.2.0 and 2.2.1 and nightly, and there was no valid log.

System environment:

  • ubuntu18
  • gcc 9.4
  • g++ 9.4
  • cuda 11.8

I compile torch from source but no work.

E0301 18:01:53.027387 140681313257280 torch/distributed/elastic/multiprocessing/api.py:669] failed (exitcode: -11) local_rank: 0 (pid: 67153) of binary: /home/luban/anaconda3/envs/voydet/bin/python
Traceback (most recent call last):
  File "/home/luban/anaconda3/envs/voydet/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch', 'console_scripts', 'torchrun')())
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/run.py", line 834, in main
    run(args)
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/run.py", line 825, in run
    elastic_launch(
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/launcher/api.py", line 137, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/launcher/api.py", line 271, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

@ghostplant
Copy link
Author

ghostplant commented Mar 1, 2024

No doublt that Pytorch 2.2.0 release is not compatible with Ubuntu 18.04 (10-Year LTS)

This is the last version working well, is it possible to revert to this state:

pip3.8 install torch==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118

@ezyang
Copy link
Contributor

ezyang commented Mar 1, 2024

High pri based on activity

@malfet malfet added this to the 2.2.2 milestone Mar 4, 2024
@malfet malfet self-assigned this Mar 4, 2024
@malfet
Copy link
Contributor

malfet commented Mar 4, 2024

This looks related to #120020

Very likely yes, switch from c10::optional to std::optional caused the crash, and solution is to pass appropriate CCbin to GPU
Assigning myself to check whether it's indeed the case, and if so add option to cpp extension to error out if gcc-7 is used as host compiler by NVCC

@malfet
Copy link
Contributor

malfet commented Mar 5, 2024

If one to cherry-pick change from #120126 into 2.2 branch error becomes obvious, but installing gcc-9 fixes it as expected

I.e. if one runs artifacts from the following dockerfile, tutel works as expected

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu18.04 as ubuntu18.04-torch2.2.0
RUN apt update && \
    apt install software-properties-common -y && \
    add-apt-repository ppa:ubuntu-toolchain-r/test -y && \
    apt-get update -y && \
    apt install python3-pip python3.8-dev g++-9 git -y && \
    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-9 && \
    update-alternatives --install /usr/bin/x86_64-linux-gnu-gcc x86_64-linux-gnu-gcc /usr/bin/gcc-9 60 --slave /usr/bin/x86_64-linux-gnu-g++ x86_64-linux-gnu-g++ /usr/bin/g++-9 && \
    python3.8 -mpip install --upgrade pip && \
    python3.8 -mpip install torch==2.2.0

I.e.

$ docker run --gpus all --rm -it docker.io/library/ubuntu18.04-torch2.2.0  bash -c "python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main; python3.8 -m tutel.examples.helloworld --batch_size=16"

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Using pip 24.0 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
Collecting git+https://github.com/microsoft/tutel@main
  Cloning https://github.com/microsoft/tutel (to revision main) to /tmp/pip-req-build-f553t3wb
  Running command git version
  git version 2.17.1
  Running command git clone --filter=blob:none https://github.com/microsoft/tutel /tmp/pip-req-build-f553t3wb
  Cloning into '/tmp/pip-req-build-f553t3wb'...
  Running command git show-ref main
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1 refs/heads/main
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1 refs/remotes/origin/main
  Running command git symbolic-ref -q HEAD
  refs/heads/main
  Resolved https://github.com/microsoft/tutel to commit 20df39d58745e4a2d4a4dca1350c0684bcdb24b1
  Running command git rev-parse HEAD
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1
  Running command python setup.py egg_info
  running egg_info
  creating /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info
  writing /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/SOURCES.txt'
  writing manifest file '/tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/SOURCES.txt'
  /usr/local/lib/python3.8/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
  /usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py:500: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  Preparing metadata (setup.py) ... done
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/26/de/437a60a69f7fd0c79264530a97787b2ac7394616e3661236201518f8a47d/numpy-1.25.0rc1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/d0/b2/fe774844d1857804cc884bba67bec38f649c99d0dc1ee7cbbf1da601357c/numpy-1.25.0.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/cf/7a/f68d1d658a0e68084097beb212fa9356fee7eabff8b57231cc4acb555b12/numpy-1.25.1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/a0/41/8f53eff8e969dd8576ddfb45e7ed315407d27c7518ae49418be8ed532b07/numpy-1.25.2.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/29/5e/5887b95aa544a977d21f4adbc5b052897c0e730aa6408ed9903aece7f18f/numpy-1.26.0b1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/33/a9/1233954ed69e96e829e6615a6e4a68e8c99d599661edff756fb4300c9a0b/numpy-1.26.0rc1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/55/b3/b13bce39ba82b7398c06d10446f5ffd5c07db39b09bd37370dc720c7951c/numpy-1.26.0.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/78/23/f78fd8311e0f710fe1d065d50b92ce0057fe877b8ed7fd41b28ad6865bfc/numpy-1.26.1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/dd/2b/205ddff2314d4eea852e31d53b8e55eb3f32b292efc3dd86bd827ab9019d/numpy-1.26.2.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/d0/b0/13e2b50c95bfc1d5ee04925eb5c105726c838f922d0aaddd57b7c8be0f8b/numpy-1.26.3.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/65/6e/09db70a523a96d25e115e71cc56a6f9031e7b8cd166c1ac8438307c14058/numpy-1.26.4.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
Collecting numpy (from tutel==0.3)
  Obtaining dependency information for numpy from https://files.pythonhosted.org/packages/98/5d/5738903efe0ecb73e51eb44feafba32bdba2081263d40c5043568ff60faf/numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
   鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣鈹佲攣 17.3/17.3 MB 94.3 MB/s eta 0:00:00
Building wheels for collected packages: tutel
  Running command git rev-parse HEAD
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1
  Running command python setup.py bdist_wheel
  /usr/local/lib/python3.8/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/tutel
  copying tutel/system.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/moe.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/__init__.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/jit.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/net.py -> build/lib.linux-x86_64-3.8/tutel
  creating build/lib.linux-x86_64-3.8/tutel/checkpoint
  copying tutel/checkpoint/__init__.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint
  copying tutel/checkpoint/scatter.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint
  copying tutel/checkpoint/gather.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint
  creating build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/spmdx.py -> build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/patterns.py -> build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/__init__.py -> build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/solver.py -> build/lib.linux-x86_64-3.8/tutel/parted
  creating build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_switch.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_ddp_tutel.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_deepspeed.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/nccl_all_gather_v.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/nccl_all_to_all_v.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_from_scratch.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_ddp.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_amp.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/__init__.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/moe_cifar10.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/moe_mnist.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld.py -> build/lib.linux-x86_64-3.8/tutel/examples
  creating build/lib.linux-x86_64-3.8/tutel/custom
  copying tutel/custom/__init__.py -> build/lib.linux-x86_64-3.8/tutel/custom
  creating build/lib.linux-x86_64-3.8/tutel/experts
  copying tutel/experts/ffn.py -> build/lib.linux-x86_64-3.8/tutel/experts
  copying tutel/experts/__init__.py -> build/lib.linux-x86_64-3.8/tutel/experts
  creating build/lib.linux-x86_64-3.8/tutel/jit_kernels
  copying tutel/jit_kernels/gating.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels
  copying tutel/jit_kernels/__init__.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels
  copying tutel/jit_kernels/sparse.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels
  creating build/lib.linux-x86_64-3.8/tutel/gates
  copying tutel/gates/cosine_top.py -> build/lib.linux-x86_64-3.8/tutel/gates
  copying tutel/gates/__init__.py -> build/lib.linux-x86_64-3.8/tutel/gates
  copying tutel/gates/top.py -> build/lib.linux-x86_64-3.8/tutel/gates
  creating build/lib.linux-x86_64-3.8/tutel/launcher
  copying tutel/launcher/execl.py -> build/lib.linux-x86_64-3.8/tutel/launcher
  copying tutel/launcher/__init__.py -> build/lib.linux-x86_64-3.8/tutel/launcher
  copying tutel/launcher/run.py -> build/lib.linux-x86_64-3.8/tutel/launcher
  creating build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/jit_compiler.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/moe_layer.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/overlap.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/losses.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/__init__.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/communicate.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/fast_dispatch.py -> build/lib.linux-x86_64-3.8/tutel/impls
  creating build/lib.linux-x86_64-3.8/tutel/parted/backend
  copying tutel/parted/backend/__init__.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend
  creating build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  copying tutel/parted/backend/torch/config.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  copying tutel/parted/backend/torch/executor.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  copying tutel/parted/backend/torch/__init__.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  running build_ext
  /usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py:500: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  building 'tutel_custom_kernel' extension
  creating build/temp.linux-x86_64-3.8
  creating build/temp.linux-x86_64-3.8/tutel
  creating build/temp.linux-x86_64-3.8/tutel/custom
  x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.8/dist-packages/torch/include -I/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/dist-packages/torch/include/TH -I/usr/local/lib/python3.8/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.8 -c ./tutel/custom/custom_kernel.cpp -o build/temp.linux-x86_64-3.8/./tutel/custom/custom_kernel.o -Wno-sign-compare -Wno-unused-but-set-variable -Wno-terminate -Wno-unused-function -Wno-strict-aliasing -DUSE_GPU -DUSE_NCCL -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=tutel_custom_kernel -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
...
xampleModel(
  (_moe_layer): MOELayer(
    Top-K(s) = ['k=2, noise=0.0'], Total-Experts = 2 [managed by 1 device(s)],
    (experts): FusedExpertsNetwork(model_dim=2048, hidden_size=2048, output_dim=2048, local_experts=2. has_fc1_bias=True, has_fc2_bias=True.)
    (gates): ModuleList(
      (0): LinearTopKGate(
        (wg): Linear(in_features=2048, out_features=2, bias=False)
      )
    )
  )
)
[Benchmark] world_size = 1, dtype = float32, model_dim = 2048, hidden_size = 2048, samples = 8192, num_local_experts = 2, topK = 2, a2a_ffn_overlap_degree = 1, parallel_type = `adaptive:1`, device = `cuda:0`
STEP-0: loss = 23.65213, step_time = 3.245155 sec, perf = 0.25 tflops.
STEP-1: loss = 22.88889, step_time = 0.053021 sec, perf = 15.55 tflops.
STEP-2: loss = 22.15745, step_time = 0.051924 sec, perf = 15.88 tflops.
STEP-3: loss = 21.45909, step_time = 0.051836 sec, perf = 15.91 tflops.
STEP-4: loss = 20.79226, step_time = 0.044880 sec, perf = 18.37 tflops.
STEP-5: loss = 20.15416, step_time = 0.041372 sec, perf = 19.93 tflops.
STEP-6: loss = 19.54152, step_time = 0.041352 sec, perf = 19.94 tflops.
STEP-7: loss = 18.95047, step_time = 0.041155 sec, perf = 20.04 tflops.
STEP-8: loss = 18.37710, step_time = 0.041088 sec, perf = 20.07 tflops.
...

@ghostplant
Copy link
Author

Ubuntu 18.04's gcc >7 is not officially supported. Does gcc-7 compatible c10::optional solve the problem as well?

@malfet
Copy link
Contributor

malfet commented Mar 6, 2024

@ghostplant I'm not sure I understand what do you mean.
gcc-7 does not implement a full set of C++17 features and in 2.2 PyTorch language standard were finally upgraded to C++17, which includes C10 optional.
In theory, it is possible to roll back #101995 to fix this binary compatibility problem, but there could be other binary compatibility problems I'm not aware of, and we must produce binaries using gcc-9 in order to support AVX512 instruction set.

In your option, what are the benefits of continued support of gcc-7 compatibility?

@ghostplant
Copy link
Author

ghostplant commented Mar 6, 2024

OK, if gcc-7 is not supported. I think Ubuntu 18.04 environment to support Pytorch 2.2.0 would be a big problem. Is it possible that users who still stick to Ubuntu 18.04 environment (gcc-7 based) to avoid errors like this by avoid using c10::optional in their implementation?

@atalman atalman modified the milestones: 2.2.2, 2.3.0 Mar 19, 2024
@atalman
Copy link
Contributor

atalman commented Mar 19, 2024

Moving to milestone 2.3.0 since cherry-picking window for 2.2.2 is closed

@ghostplant
Copy link
Author

@atalman Is this issue fixed by current daily build for 2.3.0? If so, I'll have a try. Thanks!

@ghostplant
Copy link
Author

Looks like Ubuntu 20.04 also has this issue with Pytorch 2.2, whose Python is provided by Conda 2023:

Python 3.8.18 (default, Sep 11 2023, 13:40:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import tutel_custom_kernel
terminate called after throwing an instance of 'c10::Error'
  what():  !dispatch_key_.has_value() INTERNAL ASSERT FAILED at "../aten/src/ATen/core/library.cpp":82, please report a bug to PyTorch. (Error occurred while processing TORCH_LIBRARY block at ./tutel/custom/custom_kernel.cpp:891)
Exception raised from Library at ../aten/src/ATen/core/library.cpp:82 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f6c1333ed87 in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f6c132ef75f in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&) + 0x3f (0x7f6c1333c8bf in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: torch::Library::Library(torch::Library::Kind, std::string, std::optional<c10::DispatchKey>, char const*, unsigned int) + 0x96c (0x7f6c474c720c in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xc136 (0x7f6b0e43f136 in /anaconda/envs/py38_default/lib/python3.8/site-packages/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x11b9a (0x7f6c5f732b9a in /lib64/ld-linux-x86-64.so.2)
frame #6: <unknown function> + 0x11ca1 (0x7f6c5f732ca1 in /lib64/ld-linux-x86-64.so.2)
frame #7: _dl_catch_exception + 0xe5 (0x7f6c5f4f3ba5 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x160cf (0x7f6c5f7370cf in /lib64/ld-linux-x86-64.so.2)
frame #9: _dl_catch_exception + 0x88 (0x7f6c5f4f3b48 in /lib/x86_64-linux-gnu/libc.so.6)
frame #10: <unknown function> + 0x1560a (0x7f6c5f73660a in /lib64/ld-linux-x86-64.so.2)
frame #11: <unknown function> + 0x134c (0x7f6c5f6da34c in /lib/x86_64-linux-gnu/libdl.so.2)
frame #12: _dl_catch_exception + 0x88 (0x7f6c5f4f3b48 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: _dl_catch_error + 0x33 (0x7f6c5f4f3c13 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: <unknown function> + 0x1b59 (0x7f6c5f6dab59 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #15: dlopen + 0x4a (0x7f6c5f6da3da in /lib/x86_64-linux-gnu/libdl.so.2)
<omitting python frames>
frame #18: python3() [0x5a3250]
frame #19: python3() [0x4e8cfb]
frame #34: python3() [0x4e7aeb]
frame #41: python3() [0x5a5bd1]
frame #42: python3() [0x5a4bdf]
frame #43: python3() [0x4c0e24]
frame #46: python3() [0x45000c]
frame #48: __libc_start_main + 0xf3 (0x7f6c5f3b7083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: python3() [0x579d3d]
 
Aborted (core dumped)

@ghostplant
Copy link
Author

ghostplant commented Mar 23, 2024

I assume Conda's GLIBC is using Ubuntu 18.04's in order to be compatible with Ubuntu 18.04 environment.

@atalman atalman modified the milestones: 2.3.0, 2.3.1 Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: cpp-extensions Related to torch.utils.cpp_extension module: cuda Related to torch.cuda, and CUDA support in general oncall: releng In support of CI and Release Engineering topic: binaries triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants