Support CUDA 12.4 #104417

johnnynunez · 2023-06-29T15:58:24Z

🚀 The feature, motivation and pitch

Interesting feature:
This release introduces Heterogeneous Memory Management (HMM), allowing seamless sharing of data between host memory and accelerator devices. HMM is supported on Linux only and requires a recent kernel (6.1.24+ or 6.2.11+).

Alternatives

No response

Additional context

No response

cc @ptrblck

ptrblck · 2023-06-29T16:35:59Z

PyTorch should already support CUDA 12.2 and you should be able to build form source with it.

dagbdagb · 2023-07-29T10:02:28Z

Does the current (-git?) pytorch code allow for taking advantage of HMM?
In short, can (for a completely random example) big LLMs now be run on "any" GPU, as long as the host has sufficient memory?

chukarsten · 2023-08-04T20:12:34Z

But I can't pip install pytorch with CUDA 12.2, right?

Brinax · 2023-08-05T17:53:08Z

i have CUDA 12.2 and i can't use pytorch =/

maxpain · 2023-08-12T09:58:32Z

Any updates?

dbelenko · 2023-08-15T23:09:47Z

Also, some of the tests are failing for 12.2.1, in case anyone is running them for their own builds. Could be informative.

johnnynunez · 2023-08-24T09:29:16Z

Also, some of the tests are failing for 12.2.1, in case anyone is running them for their own builds. Could be informative.

@ptrblck same

sdake · 2023-09-12T17:57:10Z

@ptrblck Sure. Pytorch will build with cuda 12.2. The request was more about integrating with Heterogenous Memory Management.

HMM enables the consumption of system memory from within the GPU. The last time I looked, the implementation of HMM could have been better done by NVIDIA, causing stalling because tremendous amounts of work were done in top half interrupt handlers. Pytorch would need some modification to treat system memory as GPU memory directly.

Reference: https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/

sanjibnarzary · 2023-09-21T12:45:20Z

You can try this build which supports CUDA 12.2. #91122

sdake · 2023-09-21T13:53:04Z

@sanjibnarzary Thank you for the suggestion. I think there may be two issues, although I have not tried.

The nightly release is only built for cuda 12.1. 12.2 is not built. You can verify this yourself by opening https://downlaod.pytorch.org/whl/nightly/ and search for cu122.
Pytorch would need to be modified, (I strongly suspect), to support HMM. Can you elaborate on the PR where HMM was brought into pytorch?

For our community, Artificial Wisdom, I have spent a nontrivial amount of time building the following:

I built Pytorch because Kineto is not default-enabled. The Pytorch build is incomplete. However, faiss works like a champ.

If you can share the PR that activated HMM, I'll look at building with it.

Thank you for your contributions,
-steve

sanjibnarzary · 2023-09-28T13:10:06Z

hi @sdake Can you check if it is of your use Colossal-AI. As per their documentation "they implemented a dynamic heterogeneous memory management system named Gemini unlike traditional implementation which adopt static memory partition".

pip install lightning-colossalai

This will install both the colossalai package as well as the ColossalAIStrategy for the Lightning Trainer:

trainer = Trainer(strategy="colossalai", precision=16, devices=...)

You can tune several settings by instantiating the strategy objects and pass options in:

from lightning_colossalai import ColossalAIStrategy

strategy = ColossalAIStrategy(...)
trainer = Trainer(strategy=strategy, precision=16, devices=...)

See a full example of a benchmark with the a GPT-2 model of up to 24 billion parameters

sdake · 2023-09-29T14:54:00Z

@sanjibnarzary huge fan of their work. We are not currently using their work.

Thank you,
-steve

johnnynunez · 2023-09-29T15:54:04Z

I did the tutorial for jetson AGX Orin:
https://hackmd.io/@johnnync13/SJqAMlzg6

sdake · 2023-09-30T19:01:15Z

Again, forgive the break in protocol.

Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.

I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)

# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel

Please note, that T_(env variable) are typed variables.

I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.

Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).

Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!

Thank you,
-steve

johnnynunez · 2023-10-01T11:37:17Z

Again, forgive the break in protocol.

Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.

I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)
# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel
Please note, that T_(env variable) are typed variables.

I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.

Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).

Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!

Thank you, -steve

I had RTX 3090 but I sold it... I can't help you right now, but I found this: https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac

I think that magma122 is out, you can check it and skip it

zhanwenchen · 2023-10-11T18:17:57Z

Again, forgive the break in protocol.
Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.
I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)
# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel
Please note, that T_(env variable) are typed variables.
I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.
Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).
Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!
Thank you, -steve
I had RTX 3090 but I sold it... I can't help you right now, but I found this: https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac

I think that magma122 is out, you can check it and skip it

Hi @johnnynunez! I'm the author of the Medium article you referenced. I have encountered a few problems with my magma-cuda122 build on a system that I can no longer access. I'm retracing my own steps on a new server I just finished building for myself with 3090 Ti. This work should be done this week. Whenever it's ready, I'll open a PR on the pytorch/builder repo and record the steps in an update to my Medium article. I don't think magma-cuda122 is out yet: https://anaconda.org/search?q=magma-cuda12

johnnynunez · 2023-10-11T18:57:36Z

Again, forgive the break in protocol.
Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.
I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)
# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel
Please note, that T_(env variable) are typed variables.
I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.
Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).
Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!
Thank you, -steve
I had RTX 3090 but I sold it... I can't help you right now, but I found this: https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac
I think that magma122 is out, you can check it and skip it
Hi @johnnynunez! I'm the author of the Medium article you referenced. I have encountered a few problems with my magma-cuda122 build on a system that I can no longer access. I'm retracing my own steps on a new server I just finished building for myself with 3090 Ti. This work should be done this week. Whenever it's ready, I'll open a PR on the pytorch/builder repo and record the steps in an update to my Medium article. I don't think magma-cuda122 is out yet: https://anaconda.org/search?q=magma-cuda12

The fact is that I guess pytorch will need extra code to implement HMM. I guess we will be informed maybe in a next release. @ptrblck

johnnynunez · 2023-10-25T16:09:55Z

I bought RTX 4090. How is it going this? I will build

johnnynunez · 2023-10-25T16:51:25Z

Is this compatible with HMM? https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-09.html @ptrblck

johnnynunez · 2023-11-17T20:19:39Z

Any news?

yhyu13 · 2023-12-02T04:10:48Z

Is there any schedule in officially support CUDA 12.2?

johnnynunez · 2023-12-02T09:53:33Z

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

Vins33 · 2024-02-14T09:37:28Z

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

johnnynunez · 2024-02-14T09:38:31Z

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

cuda 12.4 is go out this month.
Cudnn 9 is out right now.
Tensorrt10 is also coming
Nvidia is pushing a lot because next month is gtc

Vins33 · 2024-02-14T09:44:20Z

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

cuda 12.4 is go out this month. Cudnn 9 is out right now. Tensorrt10 is also coming Nvidia is pushing a lot because next month is gtc

you don't have a date then ?
thankyou for the feedback

johnnynunez · 2024-02-14T09:45:21Z

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

cuda 12.4 is go out this month. Cudnn 9 is out right now. Tensorrt10 is also coming Nvidia is pushing a lot because next month is gtc

you don't have a date then ? thankyou for the feedback

Depends of the owners from pytorch but they are working if you see PRs that they upgrading cudnn

johnnynunez · 2024-02-25T11:53:20Z

johnnynunez · 2024-03-06T08:24:20Z

Cuda 12.4 is out
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

spyoungtech · 2024-03-08T21:50:01Z

For whatever it's worth, I installed Cuda 12.4 and tested against the pre-compiled torch vision wheels from Cuda 12.1 (cu121) and it seemed to work fine for my narrow use case (working with easyocr). YMMV.

Vins33 · 2024-03-09T09:37:11Z

I have a 4080 super and pythorc still doesn't support cuda 12.4, I can't underversion drivers because it doesn't support them

same error with nightly version
@johnnynunez @spyoungtech @ptrblck

johnnynunez · 2024-03-10T21:47:28Z

I have a 4080 super and pythorc still doesn't support cuda 12.4, I can't underversion drivers because it doesn't support them

same error with nightly version @johnnynunez @spyoungtech @ptrblck

Did you compile it manually?

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch
export USE_NCCL=1 && \
export USE_QNNPACK=0 && \
export USE_PYTORCH_QNNPACK=0 && \
export USE_NATIVE_ARCH=1 && \
export USE_DISTRIBUTED=1 && \
export USE_TENSORRT=0 && \
export TORCH_CUDA_ARCH_LIST="8.9"

export PYTORCH_BUILD_VERSION=2.2.1 && \
export PYTORCH_BUILD_NUMBER=1
export MAKEFLAGS="-j$(nproc)"

cd pytorch
pip install -U -r requirements.txt
pip install -U scikit-build
pip install -U ninja
pip3 install libstdcxx-ng=12
pip install -U cmake
python setup.py bdist_wheel

ptrblck · 2024-03-11T13:49:47Z

This thread quite diverged into random discussions misunderstanding how pre-built PyTorch binaries work instead of a CUDA 12.4 tracking issue.

In any case:

I installed Cuda 12.4 and tested against the pre-compiled torch vision wheels from Cuda 12.1

Your locally installed CUDA toolkit won't be used unless you install PyTorch from source or a custom CUDA extension. You would need to install an NVIDIA driver to execute workloads via the PyTorch binaries.

I have a 4080 super and pythorc still doesn't support cuda 12.4,

PyTorch does support CUDA 12.4 and you can build it from source while we are updating the binary build process.
However, your 4080 is already working with all currently built binaries.

johnnynunez · 2024-03-11T13:59:56Z

This thread quite diverged into random discussions misunderstanding how pre-built PyTorch binaries work instead of a CUDA 12.4 tracking issue.

In any case:

I installed Cuda 12.4 and tested against the pre-compiled torch vision wheels from Cuda 12.1

Your locally installed CUDA toolkit won't be used unless you install PyTorch from source or a custom CUDA extension. You would need to install an NVIDIA driver to execute workloads via the PyTorch binaries.

I have a 4080 super and pythorc still doesn't support cuda 12.4,

PyTorch does support CUDA 12.4 and you can build it from source while we are updating the binary build process. However, your 4080 is already working with all currently built binaries.

So finally, binaries from pytorch will be with 12.4?

Vins33 · 2024-03-11T14:01:28Z

I simply installed pytorch using the command in the home page, using pip.
If there is a way to install it better and make it work, please tell me

@ptrblck @johnnynunez

johnnynunez · 2024-03-11T14:04:37Z

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me

@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.

To use the latest version of cuda, you need to compile pytorch from source.
My question in this thread, is if they finally update those binaries that are generated with Continous Integration.

Vins33 · 2024-03-11T14:08:08Z

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.

To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch
export USE_NCCL=1 &&
export USE_QNNPACK=0 &&
export USE_PYTORCH_QNNPACK=0 &&
export USE_NATIVE_ARCH=1 &&
export USE_DISTRIBUTED=1 &&
export USE_TENSORRT=0 &&
export TORCH_CUDA_ARCH_LIST="8.9"

export PYTORCH_BUILD_VERSION=2.2.1 &&
export PYTORCH_BUILD_NUMBER=1
export MAKEFLAGS="-j$(nproc)"

cd pytorch
pip install -U -r requirements.txt
pip install -U scikit-build
pip install -U ninja
pip3 install libstdcxx-ng=12
pip install -U cmake
python setup.py bdist_wheel

using this method? @johnnynunez

johnnynunez · 2024-03-11T14:09:52Z

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.
To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch export USE_NCCL=1 && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export USE_NATIVE_ARCH=1 && export USE_DISTRIBUTED=1 && export USE_TENSORRT=0 && export TORCH_CUDA_ARCH_LIST="8.9"

export PYTORCH_BUILD_VERSION=2.2.1 && export PYTORCH_BUILD_NUMBER=1 export MAKEFLAGS="-j$(nproc)"

cd pytorch pip install -U -r requirements.txt pip install -U scikit-build pip install -U ninja pip3 install libstdcxx-ng=12 pip install -U cmake python setup.py bdist_wheel

using this method? @johnnynunez

if you want use pre-compiled is:

pip3 install -U torch torchvision torchaudio

Vins33 · 2024-03-11T14:12:15Z

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.
To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch export USE_NCCL=1 && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export USE_NATIVE_ARCH=1 && export USE_DISTRIBUTED=1 && export USE_TENSORRT=0 && export TORCH_CUDA_ARCH_LIST="8.9"
export PYTORCH_BUILD_VERSION=2.2.1 && export PYTORCH_BUILD_NUMBER=1 export MAKEFLAGS="-j$(nproc)"
cd pytorch pip install -U -r requirements.txt pip install -U scikit-build pip install -U ninja pip3 install libstdcxx-ng=12 pip install -U cmake python setup.py bdist_wheel
using this method? @johnnynunez

if you want use pre-compiled is:
pip3 install -U torch torchvision torchaudio

but will using this method enable me to use the GPU with Cuda 12.4?

johnnynunez · 2024-03-11T16:44:32Z

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.
To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch export USE_NCCL=1 && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export USE_NATIVE_ARCH=1 && export USE_DISTRIBUTED=1 && export USE_TENSORRT=0 && export TORCH_CUDA_ARCH_LIST="8.9"
export PYTORCH_BUILD_VERSION=2.2.1 && export PYTORCH_BUILD_NUMBER=1 export MAKEFLAGS="-j$(nproc)"
cd pytorch pip install -U -r requirements.txt pip install -U scikit-build pip install -U ninja pip3 install libstdcxx-ng=12 pip install -U cmake python setup.py bdist_wheel
using this method? @johnnynunez

if you want use pre-compiled is:
pip3 install -U torch torchvision torchaudio
but will using this method enable me to use the GPU with Cuda 12.4?

yes of course

Vins33 · 2024-03-13T01:05:25Z

I have tried both methods but nothing, I will wait for a direct pip @johnnynunez

Vins33 · 2024-03-13T11:13:17Z

finally worked, the steps are to create a new virtual environment and copy the pip from the site's home page.

smsaqlain · 2024-03-13T18:47:26Z

@Vins33 I would need your help to elaborate the step by step configuration. i am suffering from the same pain

Vins33 · 2024-03-14T07:58:40Z

@Vins33 I would need your help to elaborate the step by step configuration. i am suffering from the same pain

@smsaqlain first you create a new env with python -m venv venv , then you install using pip with cuda 12.1. I'm using Python 3.12 but 3.11 shouldn't cause any problems either

dominicklee · 2024-03-28T10:55:37Z

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

First, uninstall all the PyTorch packages using pip. Do the same with and without the sudo command:

sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge

Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.
Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to Nvidia's cuDNN download page for instructions.
Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

Output:

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

yxchng · 2024-03-31T03:00:11Z

Any ETA for cu124 wheels?

AndyZarks · 2024-04-01T11:28:34Z

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister
I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

First, uninstall all the PyTorch packages using pip. Do the same with and without the sudo command:
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.

Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to Nvidia's cuDNN download page for instructions.

Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:
import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())
Output:
2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True
Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

Thank you so much! I'm using CUDA 12.4 and this work for me perfectly :)

d12 · 2024-04-09T14:10:34Z

I was seeing GET was unable to find an engine to execute this computation and RuntimeError: CUDA error: an illegal instruction was encountered errors, it only happened on some workloads but I think it was ultimately due to CUDA / pytorch incompatibilities. I upgraded to CUDA 12.4 and followed @dominicklee 's instructions (#104417 (comment)) to upgrade pytorch, everything is working as expected now 🎉

Prk0612 · 2024-04-11T11:50:04Z

OSError: [WinError 126] The specified module could not be found............torch\lib\shm.dll" or one of its dependencies.

Getting this error after your steps @dominicklee

zhanwenchen · 2024-04-11T14:41:40Z

For PyTorch 2.2.2, you can use CUDA 12.4 but not cuDNN 9 (8 is fine). There are a few modifications you need to make to the sources (I'm not on my Ubuntu machine, so I will update this post later).

By the way, I never quite finished the magma-cuda123 build. I tried to do 2.7.1, but it turns out I don't know enough about MAGMA to know what to monkey-patch.

johnnynunez · 2024-04-11T14:43:38Z

For PyTorch 2.2.2, you can use CUDA 12.4 but not cuDNN 9 (8 is fine). There are a few modifications you need to make to the sources (I'm not on my Ubuntu machine, so I will update this post later).

By the way, I never quite finished the magma-cuda123 build. I tried to do 2.7.1, but it turns out I don't know enough about MAGMA to know what to monkey-patch.

This is really the behavior I have, with cudnn 9 I have not been able to make it work.

johnnynunez · 2024-04-25T12:35:11Z

All is working with torch 2.3.0.
12.4 Cudnn 9.1 and Tensorrt 10.0.1

Geremia · 2024-05-16T04:47:16Z

@johnnynunez

All is working with torch 2.3.0.
12.4 Cudnn 9.1

I get:

Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "12.4")

It knows I have CUDA 12.4, so why can't it find CUDA?

hayatkhan8660-maker · 2024-05-24T00:37:12Z

Hi Folks,

I am installing apex for mixed precision training. My machine has CUDA 12.4, I have installed torch=2.3.0 but is pre-compiled with CUDA 12.1.

While installing apex, I get the following error.

Using pip 24.0 from /homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip (python 3.8)
Processing /homes/hayatu/Video-FocalNets/apex
Preparing metadata (pyproject.toml): started
Running command Preparing metadata (pyproject.toml)

torch.version = 2.3.0+cu121

running dist_info
creating /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info
writing /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-zrfl0g3e/apex-0.1.dist-info'
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: packaging>20.6 in /homes/hayatu/.local/lib/python3.8/site-packages (from apex==0.1) (23.2)
Building wheels for collected packages: apex
Building wheel for apex (pyproject.toml): started
Running command Building wheel for apex (pyproject.toml)

torch.version = 2.3.0+cu121

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
from /homes/hayatu/miniconda3/envs/focal/bin

Traceback (most recent call last):
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/setuptools/build_meta.py", line 410, in build_wheel
return self._build_with_temp_dir(
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
self.run_setup()
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 178, in
File "", line 40, in check_cuda_torch_binary_vs_bare_metal
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
In some cases, a minor-version mismatch will not cause later errors: NVIDIA/apex#323 (comment). You can try commenting out this check (at your own risk).
error: subprocess-exited-with-error

× Building wheel for apex (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /homes/hayatu/miniconda3/envs/focal/bin/python3 /homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpyqf4apm5
cwd: /homes/hayatu/Video-FocalNets/apex
Building wheel for apex (pyproject.toml): finished with status 'error'
ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects

Will truly appreciate your help!

mikaylagawarecki added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 29, 2023

johnnynunez changed the title ~~Support CUDA 12.2~~ Support CUDA 12.4 Feb 2, 2024

johnnynunez closed this as completed Apr 25, 2024

Bewinxed mentioned this issue May 3, 2024

Torch is not able to see GPU AUTOMATIC1111/stable-diffusion-webui#15614

Open

6 tasks

jmgirard mentioned this issue May 5, 2024

plans to support cuda 12.3? mlverse/torch#1137

Open

LinearParadox mentioned this issue May 16, 2024

Compatibility with VisiumHD BayraktarLab/cell2location#358

Open

Support CUDA 12.4 #104417

Support CUDA 12.4 #104417

Comments

johnnynunez commented Jun 29, 2023 • edited by pytorch-bot bot

🚀 The feature, motivation and pitch

Alternatives

Additional context

ptrblck commented Jun 29, 2023

dagbdagb commented Jul 29, 2023

chukarsten commented Aug 4, 2023

Brinax commented Aug 5, 2023

maxpain commented Aug 12, 2023

dbelenko commented Aug 15, 2023 • edited

johnnynunez commented Aug 24, 2023

sdake commented Sep 12, 2023

sanjibnarzary commented Sep 21, 2023

sdake commented Sep 21, 2023 • edited

sanjibnarzary commented Sep 28, 2023 • edited

sdake commented Sep 29, 2023

johnnynunez commented Sep 29, 2023 • edited

sdake commented Sep 30, 2023

johnnynunez commented Oct 1, 2023 • edited

zhanwenchen commented Oct 11, 2023 • edited

johnnynunez commented Oct 11, 2023

johnnynunez commented Oct 25, 2023

johnnynunez commented Oct 25, 2023

johnnynunez commented Nov 17, 2023

yhyu13 commented Dec 2, 2023

johnnynunez commented Dec 2, 2023 • edited

Vins33 commented Feb 14, 2024

johnnynunez commented Feb 14, 2024 • edited

Vins33 commented Feb 14, 2024

johnnynunez commented Feb 14, 2024

johnnynunez commented Feb 25, 2024

johnnynunez commented Mar 6, 2024

spyoungtech commented Mar 8, 2024

Vins33 commented Mar 9, 2024 • edited

johnnynunez commented Mar 10, 2024

ptrblck commented Mar 11, 2024

johnnynunez commented Mar 11, 2024

Vins33 commented Mar 11, 2024

johnnynunez commented Mar 11, 2024 • edited

Vins33 commented Mar 11, 2024

johnnynunez commented Mar 11, 2024

Vins33 commented Mar 11, 2024

johnnynunez commented Mar 11, 2024

Vins33 commented Mar 13, 2024 • edited

Vins33 commented Mar 13, 2024

smsaqlain commented Mar 13, 2024

Vins33 commented Mar 14, 2024

dominicklee commented Mar 28, 2024

yxchng commented Mar 31, 2024

AndyZarks commented Apr 1, 2024 • edited

d12 commented Apr 9, 2024

Prk0612 commented Apr 11, 2024 • edited

zhanwenchen commented Apr 11, 2024

johnnynunez commented Apr 11, 2024

johnnynunez commented Apr 25, 2024

Geremia commented May 16, 2024

hayatkhan8660-maker commented May 24, 2024 • edited

johnnynunez commented Jun 29, 2023 •

edited by pytorch-bot bot

dbelenko commented Aug 15, 2023 •

edited

sdake commented Sep 21, 2023 •

edited

sanjibnarzary commented Sep 28, 2023 •

edited

johnnynunez commented Sep 29, 2023 •

edited

johnnynunez commented Oct 1, 2023 •

edited

zhanwenchen commented Oct 11, 2023 •

edited

johnnynunez commented Dec 2, 2023 •

edited

johnnynunez commented Feb 14, 2024 •

edited

Vins33 commented Mar 9, 2024 •

edited

johnnynunez commented Mar 11, 2024 •

edited

Vins33 commented Mar 13, 2024 •

edited

AndyZarks commented Apr 1, 2024 •

edited

Prk0612 commented Apr 11, 2024 •

edited

hayatkhan8660-maker commented May 24, 2024 •

edited