CUDA Extension Bug for Pytorch 2.2.0 #118842

ghostplant · 2024-02-01T05:09:26Z

🐛 Describe the bug

Seems like the released Pytorch 2.2.0 for CUDA Linux has bugs to run C++ extensions. However, Pytorch 2.2.0 for CPU is not affected by this. So looks like this is a "2.2.0 + CUDA-only" bug.

To reproduce:

$ python3.8 -m pip install torch==2.2.0 --index-url https://download.pytorch.org/whl/cu118

$ python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main

$ python3.8 -m tutel.examples.helloworld --batch_size=16
terminate called after throwing an instance of 'c10::Error'
  what():  !dispatch_key_.has_value() INTERNAL ASSERT FAILED at "../aten/src/ATen/core/library.cpp":82, please report a bug to PyTorch. (Error occurred while processing TORCH_LIBRARY block at ./tutel/custom/custom_kernel.cpp:891)
Exception raised from Library at ../aten/src/ATen/core/library.cpp:82 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7ff953aded87 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7ff953a8f75f in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&) + 0x3f (0x7ff953adc8bf in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #3: torch::Library::Library(torch::Library::Kind, std::string, std::optional<c10::DispatchKey>, char const*, unsigned int) + 0x96c (0x7ff98b9a71bc in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x189c6 (0x7ff8e0dd29c6 in /usr/local/lib/python3.8/dist-packages/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x108d3 (0x7ff9a58348d3 in /lib64/ld-linux-x86-64.so.2)
frame #6: <unknown function> + 0x1539f (0x7ff9a583939f in /lib64/ld-linux-x86-64.so.2)
frame #7: _dl_catch_exception + 0x6f (0x7ff9a559a16f in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x1496a (0x7ff9a583896a in /lib64/ld-linux-x86-64.so.2)
frame #9: <unknown function> + 0xf96 (0x7ff9a5010f96 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #10: _dl_catch_exception + 0x6f (0x7ff9a559a16f in /lib/x86_64-linux-gnu/libc.so.6)
frame #11: _dl_catch_error + 0x2f (0x7ff9a559a1ff in /lib/x86_64-linux-gnu/libc.so.6)
frame #12: <unknown function> + 0x1745 (0x7ff9a5011745 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #13: dlopen + 0x71 (0x7ff9a5011051 in /lib/x86_64-linux-gnu/libdl.so.2)
<omitting python frames>
frame #16: python3() [0x664d58]
frame #17: python3() [0x5c08c9]
frame #32: python3() [0x5ee6f0]
frame #37: python3() [0x4c6e27]
frame #38: python3() [0x5c08c9]
frame #51: python3() [0x5ee6f0]
frame #56: python3() [0x4c6e27]
frame #57: python3() [0x5c08c9]

If I turn to use Pytorch 2.0.0 or 2.1.0 with python3 -m pip install https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp38-cp38-linux_x86_64.whl which is an all-in-one package, everything is fine.

BTW, I also tried other C++ Extensions, Pytorch 2.2.0 for CUDA always produces the same error.

Versions

Verified that it is not related to Python version. This bug happens in both Python3.8 and Python3.12.

cc @ezyang @gchanan @zou3519 @kadeng @malfet @ptrblck

The text was updated successfully, but these errors were encountered:

ezyang · 2024-02-01T12:54:10Z

Has tutel been rebuilt for PyTorch 2.2? I would expect a rebuild to be needed

ghostplant · 2024-02-01T13:54:37Z

Has tutel been rebuilt for PyTorch 2.2? I would expect a rebuild to be needed

Yes, it is a fresh installation. I verified it is not related to Tutel. All CUDAExtension will fail to register to Pytorch registry.

(1) It is not related to Python version (e.g. Python3.8 - 3.12);
(2) It is not related to non-CUDAExtension (e.g. CPUExtention);
(3) It is not related any high-level applications;
(4) Pytorch 2.0.0 / 2.1.0 doesn't have this problem.

ezyang · 2024-02-01T20:01:33Z

Do you have a repro that involves directly loading a cpp extension from the Python API (e.g., passing in c++ source code?)

ghostplant · 2024-02-02T01:53:39Z

For Tutel, you can disable these lines to install extension using cpp only instead of CUDA.

ezyang · 2024-02-07T04:39:04Z

When I build and install tutel with a build from source of pytorch, it works. So it's either an environment problem or a binaries specific problem.

aitor-martinez-seras · 2024-02-08T12:07:51Z

I have encountered same issue (in python 3.10), but in my case after installing Detectron2 and using it. Thought it was on them the error. Thanks for the temporal solution @ghostplant (downgrading to previous 2.x versions).

ghostplant · 2024-02-10T09:47:59Z

@ezyang Can you try Ubuntu 18.04 + Python:any + Pytorch 2.2.0? Seems like Ubuntu 18.04 is always reproducible. The package below is the last version that works with CUDA extension on Ubunu 18.04:

pip3.8 install torch==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118

ghostplant · 2024-02-10T11:33:51Z

@ezyang Seems like Ubuntu 18.04 is the root cause making Pytorch CUDA extension no longer work since 2.2.0, while a lot of machines stick to 18.04 environment. The problem can be reproduced with this environment:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu18.04
...
RUN apt install python3-pip python3.8 -y
...

Next, this one works using 2.1.0:

python3.8 -m pip install install torch==2.1.0
python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main
python3.8 -m tutel.examples.helloworld --batch_size=16

This setting doesn't work using 2.2.0:

python3.8 -m pip install install torch==2.2.0
python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main
python3.8 -m tutel.examples.helloworld --batch_size=16

When disabling CUDA extension, this setting still works using 2.2.0:

python3.8 -m pip install install torch==2.2.0
NO_CUDA=1 python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main
python3.8 -m tutel.examples.helloworld --batch_size=16 --device cpu

ezyang · 2024-02-19T04:31:58Z

@malfet is this the gcc upgrade thing? Sounds like the gcc upgrade thing

jiweibo · 2024-03-01T10:03:08Z

I encountered the same problem, using CUDAExtension on pytorch 2.2.0 and 2.2.1 and nightly, and there was no valid log.

System environment:

ubuntu18
gcc 9.4
g++ 9.4
cuda 11.8

I compile torch from source but no work.

E0301 18:01:53.027387 140681313257280 torch/distributed/elastic/multiprocessing/api.py:669] failed (exitcode: -11) local_rank: 0 (pid: 67153) of binary: /home/luban/anaconda3/envs/voydet/bin/python
Traceback (most recent call last):
  File "/home/luban/anaconda3/envs/voydet/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch', 'console_scripts', 'torchrun')())
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/run.py", line 834, in main
    run(args)
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/run.py", line 825, in run
    elastic_launch(
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/launcher/api.py", line 137, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/nfs/volume-382-86/wilber/pytorch/torch/distributed/launcher/api.py", line 271, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

ghostplant · 2024-03-01T14:29:36Z

No doublt that Pytorch 2.2.0 release is not compatible with Ubuntu 18.04 (10-Year LTS)

This is the last version working well, is it possible to revert to this state:

pip3.8 install torch==2.2.0.dev20231010+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118

ezyang · 2024-03-01T14:31:48Z

High pri based on activity

malfet · 2024-03-04T18:56:39Z

This looks related to #120020

Very likely yes, switch from c10::optional to std::optional caused the crash, and solution is to pass appropriate CCbin to GPU
Assigning myself to check whether it's indeed the case, and if so add option to cpp extension to error out if gcc-7 is used as host compiler by NVCC

malfet · 2024-03-05T19:36:19Z

If one to cherry-pick change from #120126 into 2.2 branch error becomes obvious, but installing gcc-9 fixes it as expected

I.e. if one runs artifacts from the following dockerfile, tutel works as expected

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu18.04 as ubuntu18.04-torch2.2.0
RUN apt update && \
    apt install software-properties-common -y && \
    add-apt-repository ppa:ubuntu-toolchain-r/test -y && \
    apt-get update -y && \
    apt install python3-pip python3.8-dev g++-9 git -y && \
    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-9 && \
    update-alternatives --install /usr/bin/x86_64-linux-gnu-gcc x86_64-linux-gnu-gcc /usr/bin/gcc-9 60 --slave /usr/bin/x86_64-linux-gnu-g++ x86_64-linux-gnu-g++ /usr/bin/g++-9 && \
    python3.8 -mpip install --upgrade pip && \
    python3.8 -mpip install torch==2.2.0

I.e.

$ docker run --gpus all --rm -it docker.io/library/ubuntu18.04-torch2.2.0  bash -c "python3.8 -m pip install --verbose --upgrade git+https://github.com/microsoft/tutel@main; python3.8 -m tutel.examples.helloworld --batch_size=16"

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Using pip 24.0 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
Collecting git+https://github.com/microsoft/tutel@main
  Cloning https://github.com/microsoft/tutel (to revision main) to /tmp/pip-req-build-f553t3wb
  Running command git version
  git version 2.17.1
  Running command git clone --filter=blob:none https://github.com/microsoft/tutel /tmp/pip-req-build-f553t3wb
  Cloning into '/tmp/pip-req-build-f553t3wb'...
  Running command git show-ref main
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1 refs/heads/main
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1 refs/remotes/origin/main
  Running command git symbolic-ref -q HEAD
  refs/heads/main
  Resolved https://github.com/microsoft/tutel to commit 20df39d58745e4a2d4a4dca1350c0684bcdb24b1
  Running command git rev-parse HEAD
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1
  Running command python setup.py egg_info
  running egg_info
  creating /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info
  writing /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/SOURCES.txt'
  writing manifest file '/tmp/pip-pip-egg-info-19e8ntt8/tutel.egg-info/SOURCES.txt'
  /usr/local/lib/python3.8/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
  /usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py:500: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  Preparing metadata (setup.py) ... done
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/26/de/437a60a69f7fd0c79264530a97787b2ac7394616e3661236201518f8a47d/numpy-1.25.0rc1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/d0/b2/fe774844d1857804cc884bba67bec38f649c99d0dc1ee7cbbf1da601357c/numpy-1.25.0.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/cf/7a/f68d1d658a0e68084097beb212fa9356fee7eabff8b57231cc4acb555b12/numpy-1.25.1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/a0/41/8f53eff8e969dd8576ddfb45e7ed315407d27c7518ae49418be8ed532b07/numpy-1.25.2.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/29/5e/5887b95aa544a977d21f4adbc5b052897c0e730aa6408ed9903aece7f18f/numpy-1.26.0b1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/33/a9/1233954ed69e96e829e6615a6e4a68e8c99d599661edff756fb4300c9a0b/numpy-1.26.0rc1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/55/b3/b13bce39ba82b7398c06d10446f5ffd5c07db39b09bd37370dc720c7951c/numpy-1.26.0.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '<3.13,>=3.9'): https://files.pythonhosted.org/packages/78/23/f78fd8311e0f710fe1d065d50b92ce0057fe877b8ed7fd41b28ad6865bfc/numpy-1.26.1.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:<3.13,>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/dd/2b/205ddff2314d4eea852e31d53b8e55eb3f32b292efc3dd86bd827ab9019d/numpy-1.26.2.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/d0/b0/13e2b50c95bfc1d5ee04925eb5c105726c838f922d0aaddd57b7c8be0f8b/numpy-1.26.3.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
  Link requires a different Python (3.8.0 not in: '>=3.9'): https://files.pythonhosted.org/packages/65/6e/09db70a523a96d25e115e71cc56a6f9031e7b8cd166c1ac8438307c14058/numpy-1.26.4.tar.gz (from https://pypi.org/simple/numpy/) (requires-python:>=3.9)
Collecting numpy (from tutel==0.3)
  Obtaining dependency information for numpy from https://files.pythonhosted.org/packages/98/5d/5738903efe0ecb73e51eb44feafba32bdba2081263d40c5043568ff60faf/numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 94.3 MB/s eta 0:00:00
Building wheels for collected packages: tutel
  Running command git rev-parse HEAD
  20df39d58745e4a2d4a4dca1350c0684bcdb24b1
  Running command python setup.py bdist_wheel
  /usr/local/lib/python3.8/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/tutel
  copying tutel/system.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/moe.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/__init__.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/jit.py -> build/lib.linux-x86_64-3.8/tutel
  copying tutel/net.py -> build/lib.linux-x86_64-3.8/tutel
  creating build/lib.linux-x86_64-3.8/tutel/checkpoint
  copying tutel/checkpoint/__init__.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint
  copying tutel/checkpoint/scatter.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint
  copying tutel/checkpoint/gather.py -> build/lib.linux-x86_64-3.8/tutel/checkpoint
  creating build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/spmdx.py -> build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/patterns.py -> build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/__init__.py -> build/lib.linux-x86_64-3.8/tutel/parted
  copying tutel/parted/solver.py -> build/lib.linux-x86_64-3.8/tutel/parted
  creating build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_switch.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_ddp_tutel.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_deepspeed.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/nccl_all_gather_v.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/nccl_all_to_all_v.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_from_scratch.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_ddp.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld_amp.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/__init__.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/moe_cifar10.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/moe_mnist.py -> build/lib.linux-x86_64-3.8/tutel/examples
  copying tutel/examples/helloworld.py -> build/lib.linux-x86_64-3.8/tutel/examples
  creating build/lib.linux-x86_64-3.8/tutel/custom
  copying tutel/custom/__init__.py -> build/lib.linux-x86_64-3.8/tutel/custom
  creating build/lib.linux-x86_64-3.8/tutel/experts
  copying tutel/experts/ffn.py -> build/lib.linux-x86_64-3.8/tutel/experts
  copying tutel/experts/__init__.py -> build/lib.linux-x86_64-3.8/tutel/experts
  creating build/lib.linux-x86_64-3.8/tutel/jit_kernels
  copying tutel/jit_kernels/gating.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels
  copying tutel/jit_kernels/__init__.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels
  copying tutel/jit_kernels/sparse.py -> build/lib.linux-x86_64-3.8/tutel/jit_kernels
  creating build/lib.linux-x86_64-3.8/tutel/gates
  copying tutel/gates/cosine_top.py -> build/lib.linux-x86_64-3.8/tutel/gates
  copying tutel/gates/__init__.py -> build/lib.linux-x86_64-3.8/tutel/gates
  copying tutel/gates/top.py -> build/lib.linux-x86_64-3.8/tutel/gates
  creating build/lib.linux-x86_64-3.8/tutel/launcher
  copying tutel/launcher/execl.py -> build/lib.linux-x86_64-3.8/tutel/launcher
  copying tutel/launcher/__init__.py -> build/lib.linux-x86_64-3.8/tutel/launcher
  copying tutel/launcher/run.py -> build/lib.linux-x86_64-3.8/tutel/launcher
  creating build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/jit_compiler.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/moe_layer.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/overlap.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/losses.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/__init__.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/communicate.py -> build/lib.linux-x86_64-3.8/tutel/impls
  copying tutel/impls/fast_dispatch.py -> build/lib.linux-x86_64-3.8/tutel/impls
  creating build/lib.linux-x86_64-3.8/tutel/parted/backend
  copying tutel/parted/backend/__init__.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend
  creating build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  copying tutel/parted/backend/torch/config.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  copying tutel/parted/backend/torch/executor.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  copying tutel/parted/backend/torch/__init__.py -> build/lib.linux-x86_64-3.8/tutel/parted/backend/torch
  running build_ext
  /usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py:500: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  building 'tutel_custom_kernel' extension
  creating build/temp.linux-x86_64-3.8
  creating build/temp.linux-x86_64-3.8/tutel
  creating build/temp.linux-x86_64-3.8/tutel/custom
  x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.8/dist-packages/torch/include -I/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.8/dist-packages/torch/include/TH -I/usr/local/lib/python3.8/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.8 -c ./tutel/custom/custom_kernel.cpp -o build/temp.linux-x86_64-3.8/./tutel/custom/custom_kernel.o -Wno-sign-compare -Wno-unused-but-set-variable -Wno-terminate -Wno-unused-function -Wno-strict-aliasing -DUSE_GPU -DUSE_NCCL -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=tutel_custom_kernel -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
...
xampleModel(
  (_moe_layer): MOELayer(
    Top-K(s) = ['k=2, noise=0.0'], Total-Experts = 2 [managed by 1 device(s)],
    (experts): FusedExpertsNetwork(model_dim=2048, hidden_size=2048, output_dim=2048, local_experts=2. has_fc1_bias=True, has_fc2_bias=True.)
    (gates): ModuleList(
      (0): LinearTopKGate(
        (wg): Linear(in_features=2048, out_features=2, bias=False)
      )
    )
  )
)
[Benchmark] world_size = 1, dtype = float32, model_dim = 2048, hidden_size = 2048, samples = 8192, num_local_experts = 2, topK = 2, a2a_ffn_overlap_degree = 1, parallel_type = `adaptive:1`, device = `cuda:0`
STEP-0: loss = 23.65213, step_time = 3.245155 sec, perf = 0.25 tflops.
STEP-1: loss = 22.88889, step_time = 0.053021 sec, perf = 15.55 tflops.
STEP-2: loss = 22.15745, step_time = 0.051924 sec, perf = 15.88 tflops.
STEP-3: loss = 21.45909, step_time = 0.051836 sec, perf = 15.91 tflops.
STEP-4: loss = 20.79226, step_time = 0.044880 sec, perf = 18.37 tflops.
STEP-5: loss = 20.15416, step_time = 0.041372 sec, perf = 19.93 tflops.
STEP-6: loss = 19.54152, step_time = 0.041352 sec, perf = 19.94 tflops.
STEP-7: loss = 18.95047, step_time = 0.041155 sec, perf = 20.04 tflops.
STEP-8: loss = 18.37710, step_time = 0.041088 sec, perf = 20.07 tflops.
...

ghostplant · 2024-03-06T05:51:36Z

Ubuntu 18.04's gcc >7 is not officially supported. Does gcc-7 compatible c10::optional solve the problem as well?

malfet · 2024-03-06T17:13:48Z

@ghostplant I'm not sure I understand what do you mean.
gcc-7 does not implement a full set of C++17 features and in 2.2 PyTorch language standard were finally upgraded to C++17, which includes C10 optional.
In theory, it is possible to roll back #101995 to fix this binary compatibility problem, but there could be other binary compatibility problems I'm not aware of, and we must produce binaries using gcc-9 in order to support AVX512 instruction set.

In your option, what are the benefits of continued support of gcc-7 compatibility?

ghostplant · 2024-03-06T22:13:30Z

OK, if gcc-7 is not supported. I think Ubuntu 18.04 environment to support Pytorch 2.2.0 would be a big problem. Is it possible that users who still stick to Ubuntu 18.04 environment (gcc-7 based) to avoid errors like this by avoid using c10::optional in their implementation?

atalman · 2024-03-19T14:30:18Z

Moving to milestone 2.3.0 since cherry-picking window for 2.2.2 is closed

ghostplant · 2024-03-19T23:58:45Z

@atalman Is this issue fixed by current daily build for 2.3.0? If so, I'll have a try. Thanks!

ghostplant · 2024-03-23T03:34:15Z

Looks like Ubuntu 20.04 also has this issue with Pytorch 2.2, whose Python is provided by Conda 2023:

Python 3.8.18 (default, Sep 11 2023, 13:40:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import tutel_custom_kernel
terminate called after throwing an instance of 'c10::Error'
  what():  !dispatch_key_.has_value() INTERNAL ASSERT FAILED at "../aten/src/ATen/core/library.cpp":82, please report a bug to PyTorch. (Error occurred while processing TORCH_LIBRARY block at ./tutel/custom/custom_kernel.cpp:891)
Exception raised from Library at ../aten/src/ATen/core/library.cpp:82 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f6c1333ed87 in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f6c132ef75f in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&) + 0x3f (0x7f6c1333c8bf in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: torch::Library::Library(torch::Library::Kind, std::string, std::optional<c10::DispatchKey>, char const*, unsigned int) + 0x96c (0x7f6c474c720c in /anaconda/envs/py38_default/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xc136 (0x7f6b0e43f136 in /anaconda/envs/py38_default/lib/python3.8/site-packages/tutel_custom_kernel.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x11b9a (0x7f6c5f732b9a in /lib64/ld-linux-x86-64.so.2)
frame #6: <unknown function> + 0x11ca1 (0x7f6c5f732ca1 in /lib64/ld-linux-x86-64.so.2)
frame #7: _dl_catch_exception + 0xe5 (0x7f6c5f4f3ba5 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x160cf (0x7f6c5f7370cf in /lib64/ld-linux-x86-64.so.2)
frame #9: _dl_catch_exception + 0x88 (0x7f6c5f4f3b48 in /lib/x86_64-linux-gnu/libc.so.6)
frame #10: <unknown function> + 0x1560a (0x7f6c5f73660a in /lib64/ld-linux-x86-64.so.2)
frame #11: <unknown function> + 0x134c (0x7f6c5f6da34c in /lib/x86_64-linux-gnu/libdl.so.2)
frame #12: _dl_catch_exception + 0x88 (0x7f6c5f4f3b48 in /lib/x86_64-linux-gnu/libc.so.6)
frame #13: _dl_catch_error + 0x33 (0x7f6c5f4f3c13 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: <unknown function> + 0x1b59 (0x7f6c5f6dab59 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #15: dlopen + 0x4a (0x7f6c5f6da3da in /lib/x86_64-linux-gnu/libdl.so.2)
<omitting python frames>
frame #18: python3() [0x5a3250]
frame #19: python3() [0x4e8cfb]
frame #34: python3() [0x4e7aeb]
frame #41: python3() [0x5a5bd1]
frame #42: python3() [0x5a4bdf]
frame #43: python3() [0x4c0e24]
frame #46: python3() [0x45000c]
frame #48: __libc_start_main + 0xf3 (0x7f6c5f3b7083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: python3() [0x579d3d]
 
Aborted (core dumped)

ghostplant · 2024-03-23T03:35:32Z

I assume Conda's GLIBC is using Ubuntu 18.04's in order to be compatible with Ubuntu 18.04 environment.

ghostplant mentioned this issue Feb 3, 2024

Pytorch for Python 3.12 not available #110436

Closed

ezyang added the high priority label Mar 1, 2024

pytorch-bot bot added the triage review label Mar 1, 2024

malfet added this to the 2.2.2 milestone Mar 4, 2024

malfet self-assigned this Mar 4, 2024

malfet removed the triage review label Mar 4, 2024

atalman modified the milestones: 2.2.2, 2.3.0 Mar 19, 2024

atalman modified the milestones: 2.3.0, 2.3.1 Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Extension Bug for Pytorch 2.2.0 #118842

CUDA Extension Bug for Pytorch 2.2.0 #118842

ghostplant commented Feb 1, 2024 •

edited by pytorch-bot bot

ezyang commented Feb 1, 2024

ghostplant commented Feb 1, 2024

ezyang commented Feb 1, 2024

ghostplant commented Feb 2, 2024

ezyang commented Feb 7, 2024

aitor-martinez-seras commented Feb 8, 2024

ghostplant commented Feb 10, 2024 •

edited

ghostplant commented Feb 10, 2024

ezyang commented Feb 19, 2024

jiweibo commented Mar 1, 2024

ghostplant commented Mar 1, 2024 •

edited

ezyang commented Mar 1, 2024

malfet commented Mar 4, 2024

malfet commented Mar 5, 2024 •

edited

ghostplant commented Mar 6, 2024

malfet commented Mar 6, 2024

ghostplant commented Mar 6, 2024 •

edited

atalman commented Mar 19, 2024

ghostplant commented Mar 19, 2024

ghostplant commented Mar 23, 2024

ghostplant commented Mar 23, 2024 •

edited

CUDA Extension Bug for Pytorch 2.2.0 #118842

CUDA Extension Bug for Pytorch 2.2.0 #118842

Comments

ghostplant commented Feb 1, 2024 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

ezyang commented Feb 1, 2024

ghostplant commented Feb 1, 2024

ezyang commented Feb 1, 2024

ghostplant commented Feb 2, 2024

ezyang commented Feb 7, 2024

aitor-martinez-seras commented Feb 8, 2024

ghostplant commented Feb 10, 2024 • edited

ghostplant commented Feb 10, 2024

ezyang commented Feb 19, 2024

jiweibo commented Mar 1, 2024

ghostplant commented Mar 1, 2024 • edited

ezyang commented Mar 1, 2024

malfet commented Mar 4, 2024

malfet commented Mar 5, 2024 • edited

ghostplant commented Mar 6, 2024

malfet commented Mar 6, 2024

ghostplant commented Mar 6, 2024 • edited

atalman commented Mar 19, 2024

ghostplant commented Mar 19, 2024

ghostplant commented Mar 23, 2024

ghostplant commented Mar 23, 2024 • edited

ghostplant commented Feb 1, 2024 •

edited by pytorch-bot bot

ghostplant commented Feb 10, 2024 •

edited

ghostplant commented Mar 1, 2024 •

edited

malfet commented Mar 5, 2024 •

edited

ghostplant commented Mar 6, 2024 •

edited

ghostplant commented Mar 23, 2024 •

edited