Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CUDA 12.4 #104417

Closed
johnnynunez opened this issue Jun 29, 2023 · 53 comments
Closed

Support CUDA 12.4 #104417

johnnynunez opened this issue Jun 29, 2023 · 53 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@johnnynunez
Copy link

johnnynunez commented Jun 29, 2023

🚀 The feature, motivation and pitch

Interesting feature:
This release introduces Heterogeneous Memory Management (HMM), allowing seamless sharing of data between host memory and accelerator devices. HMM is supported on Linux only and requires a recent kernel (6.1.24+ or 6.2.11+).

Alternatives

No response

Additional context

No response

cc @ptrblck

@ptrblck
Copy link
Collaborator

ptrblck commented Jun 29, 2023

PyTorch should already support CUDA 12.2 and you should be able to build form source with it.

@mikaylagawarecki mikaylagawarecki added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 29, 2023
@dagbdagb
Copy link

Does the current (-git?) pytorch code allow for taking advantage of HMM?
In short, can (for a completely random example) big LLMs now be run on "any" GPU, as long as the host has sufficient memory?

@chukarsten
Copy link

But I can't pip install pytorch with CUDA 12.2, right?

@Brinax
Copy link

Brinax commented Aug 5, 2023

i have CUDA 12.2 and i can't use pytorch =/

@maxpain
Copy link

maxpain commented Aug 12, 2023

Any updates?

@dbelenko
Copy link

dbelenko commented Aug 15, 2023

Also, some of the tests are failing for 12.2.1, in case anyone is running them for their own builds. Could be informative.

@johnnynunez
Copy link
Author

Also, some of the tests are failing for 12.2.1, in case anyone is running them for their own builds. Could be informative.

@ptrblck same

@sdake
Copy link

sdake commented Sep 12, 2023

@ptrblck Sure. Pytorch will build with cuda 12.2. The request was more about integrating with Heterogenous Memory Management.

HMM enables the consumption of system memory from within the GPU. The last time I looked, the implementation of HMM could have been better done by NVIDIA, causing stalling because tremendous amounts of work were done in top half interrupt handlers. Pytorch would need some modification to treat system memory as GPU memory directly.

Reference: https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/

@sanjibnarzary
Copy link

You can try this build which supports CUDA 12.2. #91122

@sdake
Copy link

sdake commented Sep 21, 2023

@sanjibnarzary Thank you for the suggestion. I think there may be two issues, although I have not tried.

  • The nightly release is only built for cuda 12.1. 12.2 is not built. You can verify this yourself by opening https://downlaod.pytorch.org/whl/nightly/ and search for cu122.
  • Pytorch would need to be modified, (I strongly suspect), to support HMM. Can you elaborate on the PR where HMM was brought into pytorch?

For our community, Artificial Wisdom, I have spent a nontrivial amount of time building the following:

I built Pytorch because Kineto is not default-enabled. The Pytorch build is incomplete. However, faiss works like a champ.

If you can share the PR that activated HMM, I'll look at building with it.

Thank you for your contributions,
-steve

@sanjibnarzary
Copy link

sanjibnarzary commented Sep 28, 2023

hi @sdake Can you check if it is of your use Colossal-AI. As per their documentation "they implemented a dynamic heterogeneous memory management system named Gemini unlike traditional implementation which adopt static memory partition".

pip install lightning-colossalai

This will install both the colossalai package as well as the ColossalAIStrategy for the Lightning Trainer:

trainer = Trainer(strategy="colossalai", precision=16, devices=...)

You can tune several settings by instantiating the strategy objects and pass options in:

from lightning_colossalai import ColossalAIStrategy

strategy = ColossalAIStrategy(...)
trainer = Trainer(strategy=strategy, precision=16, devices=...)

See a full example of a benchmark with the a GPT-2 model of up to 24 billion parameters

@sdake
Copy link

sdake commented Sep 29, 2023

@sanjibnarzary huge fan of their work. We are not currently using their work.

Thank you,
-steve

@johnnynunez
Copy link
Author

johnnynunez commented Sep 29, 2023

I did the tutorial for jetson AGX Orin:
https://hackmd.io/@johnnync13/SJqAMlzg6
image

@sdake
Copy link

sdake commented Sep 30, 2023

Again, forgive the break in protocol.

Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.

I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)

# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel

Please note, that T_(env variable) are typed variables.

I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.

Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).

Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!

Thank you,
-steve

@johnnynunez
Copy link
Author

johnnynunez commented Oct 1, 2023

Again, forgive the break in protocol.

Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.

I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)

# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel

Please note, that T_(env variable) are typed variables.

I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.

Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).

Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!

Thank you, -steve

I had RTX 3090 but I sold it... I can't help you right now, but I found this: https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac

I think that magma122 is out, you can check it and skip it

@zhanwenchen
Copy link

zhanwenchen commented Oct 11, 2023

Again, forgive the break in protocol.
Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.
I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)

# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel

Please note, that T_(env variable) are typed variables.
I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.
Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).
Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!
Thank you, -steve

I had RTX 3090 but I sold it... I can't help you right now, but I found this: https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac

I think that magma122 is out, you can check it and skip it

Hi @johnnynunez! I'm the author of the Medium article you referenced. I have encountered a few problems with my magma-cuda122 build on a system that I can no longer access. I'm retracing my own steps on a new server I just finished building for myself with 3090 Ti. This work should be done this week. Whenever it's ready, I'll open a PR on the pytorch/builder repo and record the steps in an update to my Medium article. I don't think magma-cuda122 is out yet: https://anaconda.org/search?q=magma-cuda12

@johnnynunez
Copy link
Author

Again, forgive the break in protocol.
Thanks @johnnynunez. Super work!! My next step in my pytorch PR was to use a theme on your work, where you are exporting the environment variables and building wheels with python setup.py bdist_wheel in your excellent guide.
I have been a systems engineer forever, although I only have a little experience integrating ninja, cmake, and setup.py as a set. I have experience with each of them individually. My PR from August (six weeks ago, although this is not a blocker for me, and I super appreciate the guidance!)

# only a snippet, upstream source is: https://github.com/artificialwisdomai/origin/pull/99
###
#
# Build pytorch

RUN rm -rf /workspace/build
WORKDIR /workspace/build
RUN cmake -DUSE_NCCL:${T_USE_NCCL} -DUSE_SYSTEM_NCCL:${T_USE_SYSTEM_NCCL} -DCMAKE_GENERATOR:INTERNAL=Ninja -DCMAKE_INSTALL:INTERNAL="ninja install" -DTORCH_CUDA_ARCH_LIST:${T_TORCH_CUDA_ARCH_LIST} -DBUILD_SHARED_LIBS:BOOL=ON -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda -DCUDA_NVCC_EXECUTABLE:PATH=/usr/local/cuda/bin/nvcc -DCUSPARSELT_LIBRARY_PATH:PATH=/usr/local/cuparse/lib -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DUSE_CUDA:BOOL=TRUE -DUSE_ZSTD:BOOL=TRUE -DCMAKE_INSTALL_PREFIX:PATH=/workspace/target /workspace/${PYTORCH_VERSION}
RUN ninja
RUN ninja install
RUN python setup.py bdist_wheel

Please note, that T_(env variable) are typed variables.
I have searched and searched, but you have given me some new things to search for (praise be the day when we can ask, not search). I use cmake to configure, ninja as a build generator, and then I attempt to use python setup.py to build a wheel. I prefer this workflow, although if it doesn't work, it doesn't work.
Are you aware of how I could make this type of workflow function? (As in build a wheel based upon preexissting cmake configuration and ninja build output?).
Thank you so much for your time. I hope, even if the answer is no, you may find some interest in the PR or the problem in general, given the detailed nature of your blog on the topic!
Thank you, -steve

I had RTX 3090 but I sold it... I can't help you right now, but I found this: https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac
I think that magma122 is out, you can check it and skip it

Hi @johnnynunez! I'm the author of the Medium article you referenced. I have encountered a few problems with my magma-cuda122 build on a system that I can no longer access. I'm retracing my own steps on a new server I just finished building for myself with 3090 Ti. This work should be done this week. Whenever it's ready, I'll open a PR on the pytorch/builder repo and record the steps in an update to my Medium article. I don't think magma-cuda122 is out yet: https://anaconda.org/search?q=magma-cuda12

The fact is that I guess pytorch will need extra code to implement HMM. I guess we will be informed maybe in a next release. @ptrblck

@johnnynunez
Copy link
Author

I bought RTX 4090. How is it going this? I will build

@johnnynunez
Copy link
Author

@johnnynunez
Copy link
Author

Any news?

@yhyu13
Copy link

yhyu13 commented Dec 2, 2023

Is there any schedule in officially support CUDA 12.2?

@johnnynunez
Copy link
Author

johnnynunez commented Dec 2, 2023

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

@johnnynunez johnnynunez changed the title Support CUDA 12.2 Support CUDA 12.4 Feb 2, 2024
@Vins33
Copy link

Vins33 commented Feb 14, 2024

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

@johnnynunez
Copy link
Author

johnnynunez commented Feb 14, 2024

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

cuda 12.4 is go out this month.
Cudnn 9 is out right now.
Tensorrt10 is also coming
Nvidia is pushing a lot because next month is gtc

@Vins33
Copy link

Vins33 commented Feb 14, 2024

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

cuda 12.4 is go out this month. Cudnn 9 is out right now. Tensorrt10 is also coming Nvidia is pushing a lot because next month is gtc

you don't have a date then ?
thankyou for the feedback

@johnnynunez
Copy link
Author

Is there any schedule in officially support CUDA 12.2?

Pytorch 2.2 directly to 12.4

when will the version of pytorch that supports cuda 12.4 be released?

cuda 12.4 is go out this month. Cudnn 9 is out right now. Tensorrt10 is also coming Nvidia is pushing a lot because next month is gtc

you don't have a date then ? thankyou for the feedback

Depends of the owners from pytorch but they are working if you see PRs that they upgrading cudnn

@johnnynunez
Copy link
Author

image

@johnnynunez
Copy link
Author

Cuda 12.4 is out
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

@spyoungtech
Copy link

For whatever it's worth, I installed Cuda 12.4 and tested against the pre-compiled torch vision wheels from Cuda 12.1 (cu121) and it seemed to work fine for my narrow use case (working with easyocr). YMMV.

@Vins33
Copy link

Vins33 commented Mar 9, 2024

I have a 4080 super and pythorc still doesn't support cuda 12.4, I can't underversion drivers because it doesn't support them

Screenshot 2024-03-09 103455

same error with nightly version
@johnnynunez @spyoungtech @ptrblck

@johnnynunez
Copy link
Author

I have a 4080 super and pythorc still doesn't support cuda 12.4, I can't underversion drivers because it doesn't support them

Screenshot 2024-03-09 103455

same error with nightly version @johnnynunez @spyoungtech @ptrblck

Did you compile it manually?

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch
export USE_NCCL=1 && \
export USE_QNNPACK=0 && \
export USE_PYTORCH_QNNPACK=0 && \
export USE_NATIVE_ARCH=1 && \
export USE_DISTRIBUTED=1 && \
export USE_TENSORRT=0 && \
export TORCH_CUDA_ARCH_LIST="8.9"

export PYTORCH_BUILD_VERSION=2.2.1 && \
export PYTORCH_BUILD_NUMBER=1
export MAKEFLAGS="-j$(nproc)"
cd pytorch
pip install -U -r requirements.txt
pip install -U scikit-build
pip install -U ninja
pip3 install libstdcxx-ng=12
pip install -U cmake
python setup.py bdist_wheel

@ptrblck
Copy link
Collaborator

ptrblck commented Mar 11, 2024

This thread quite diverged into random discussions misunderstanding how pre-built PyTorch binaries work instead of a CUDA 12.4 tracking issue.

In any case:

I installed Cuda 12.4 and tested against the pre-compiled torch vision wheels from Cuda 12.1

Your locally installed CUDA toolkit won't be used unless you install PyTorch from source or a custom CUDA extension. You would need to install an NVIDIA driver to execute workloads via the PyTorch binaries.

I have a 4080 super and pythorc still doesn't support cuda 12.4,

PyTorch does support CUDA 12.4 and you can build it from source while we are updating the binary build process.
However, your 4080 is already working with all currently built binaries.

@johnnynunez
Copy link
Author

This thread quite diverged into random discussions misunderstanding how pre-built PyTorch binaries work instead of a CUDA 12.4 tracking issue.

In any case:

I installed Cuda 12.4 and tested against the pre-compiled torch vision wheels from Cuda 12.1

Your locally installed CUDA toolkit won't be used unless you install PyTorch from source or a custom CUDA extension. You would need to install an NVIDIA driver to execute workloads via the PyTorch binaries.

I have a 4080 super and pythorc still doesn't support cuda 12.4,

PyTorch does support CUDA 12.4 and you can build it from source while we are updating the binary build process. However, your 4080 is already working with all currently built binaries.

So finally, binaries from pytorch will be with 12.4?

@Vins33
Copy link

Vins33 commented Mar 11, 2024

I simply installed pytorch using the command in the home page, using pip.
If there is a way to install it better and make it work, please tell me

@ptrblck @johnnynunez

@johnnynunez
Copy link
Author

johnnynunez commented Mar 11, 2024

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me

@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.

To use the latest version of cuda, you need to compile pytorch from source.
My question in this thread, is if they finally update those binaries that are generated with Continous Integration.

@Vins33
Copy link

Vins33 commented Mar 11, 2024

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.

To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch
export USE_NCCL=1 &&
export USE_QNNPACK=0 &&
export USE_PYTORCH_QNNPACK=0 &&
export USE_NATIVE_ARCH=1 &&
export USE_DISTRIBUTED=1 &&
export USE_TENSORRT=0 &&
export TORCH_CUDA_ARCH_LIST="8.9"

export PYTORCH_BUILD_VERSION=2.2.1 &&
export PYTORCH_BUILD_NUMBER=1
export MAKEFLAGS="-j$(nproc)"

cd pytorch
pip install -U -r requirements.txt
pip install -U scikit-build
pip install -U ninja
pip3 install libstdcxx-ng=12
pip install -U cmake
python setup.py bdist_wheel

using this method? @johnnynunez

@johnnynunez
Copy link
Author

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.
To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch export USE_NCCL=1 && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export USE_NATIVE_ARCH=1 && export USE_DISTRIBUTED=1 && export USE_TENSORRT=0 && export TORCH_CUDA_ARCH_LIST="8.9"

export PYTORCH_BUILD_VERSION=2.2.1 && export PYTORCH_BUILD_NUMBER=1 export MAKEFLAGS="-j$(nproc)"

cd pytorch pip install -U -r requirements.txt pip install -U scikit-build pip install -U ninja pip3 install libstdcxx-ng=12 pip install -U cmake python setup.py bdist_wheel

using this method? @johnnynunez

if you want use pre-compiled is:

pip3 install -U torch torchvision torchaudio

@Vins33
Copy link

Vins33 commented Mar 11, 2024

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.
To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch export USE_NCCL=1 && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export USE_NATIVE_ARCH=1 && export USE_DISTRIBUTED=1 && export USE_TENSORRT=0 && export TORCH_CUDA_ARCH_LIST="8.9"
export PYTORCH_BUILD_VERSION=2.2.1 && export PYTORCH_BUILD_NUMBER=1 export MAKEFLAGS="-j$(nproc)"
cd pytorch pip install -U -r requirements.txt pip install -U scikit-build pip install -U ninja pip3 install libstdcxx-ng=12 pip install -U cmake python setup.py bdist_wheel
using this method? @johnnynunez

if you want use pre-compiled is:

pip3 install -U torch torchvision torchaudio

but will using this method enable me to use the GPU with Cuda 12.4?

@johnnynunez
Copy link
Author

I simply installed pytorch using the command in the home page, using pip. If there is a way to install it better and make it work, please tell me
@ptrblck @johnnynunez

Pytorch comes with precompiled cuda and everything needed to run on gpus. That's why pytorch binaries come with cuda 11.8 or cuda 12.1.
To use the latest version of cuda, you need to compile pytorch from source.

git clone --recursive --branch v2.2.1 http://github.com/pytorch/pytorch export USE_NCCL=1 && export USE_QNNPACK=0 && export USE_PYTORCH_QNNPACK=0 && export USE_NATIVE_ARCH=1 && export USE_DISTRIBUTED=1 && export USE_TENSORRT=0 && export TORCH_CUDA_ARCH_LIST="8.9"
export PYTORCH_BUILD_VERSION=2.2.1 && export PYTORCH_BUILD_NUMBER=1 export MAKEFLAGS="-j$(nproc)"
cd pytorch pip install -U -r requirements.txt pip install -U scikit-build pip install -U ninja pip3 install libstdcxx-ng=12 pip install -U cmake python setup.py bdist_wheel
using this method? @johnnynunez

if you want use pre-compiled is:

pip3 install -U torch torchvision torchaudio

but will using this method enable me to use the GPU with Cuda 12.4?

yes of course

@Vins33
Copy link

Vins33 commented Mar 13, 2024

I have tried both methods but nothing, I will wait for a direct pip @johnnynunez

@Vins33
Copy link

Vins33 commented Mar 13, 2024

finally worked, the steps are to create a new virtual environment and copy the pip from the site's home page.

@smsaqlain
Copy link

@Vins33 I would need your help to elaborate the step by step configuration. i am suffering from the same pain

@Vins33
Copy link

Vins33 commented Mar 14, 2024

@Vins33 I would need your help to elaborate the step by step configuration. i am suffering from the same pain

@smsaqlain first you create a new env with python -m venv venv , then you install using pip with cuda 12.1. I'm using Python 3.12 but 3.11 shouldn't cause any problems either

@dominicklee
Copy link

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

  1. First, uninstall all the PyTorch packages using pip. Do the same with and without the sudo command:
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
  1. Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.
  2. Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to Nvidia's cuDNN download page for instructions.
  3. Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

Output:

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

@yxchng
Copy link

yxchng commented Mar 31, 2024

Any ETA for cu124 wheels?

@AndyZarks
Copy link

AndyZarks commented Apr 1, 2024

Hello all, I had the same problem myself. I am posting this to hopefully help anyone with a similar issue. For context, I'm running an Nvidia 4070 Ti Super GPU on my Windows workstation PC which has CUDA 12.4. This is supposed to be the latest installation. I'm using Ubuntu 22.04 as well, so I am running in WSL2. Now, the problem was that I've tried pip uninstalling and reinstalling PyTorch to no avail. Every time I try running PyTorch in Python, I would get this error:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/user/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclCommRegister

I am aware that at the moment, PyTorch was built for CUDA 12.1, but I've got it to work after some hours of troubleshooting. Here is what ultimately worked for me:

  1. First, uninstall all the PyTorch packages using pip. Do the same with and without the sudo command:
sudo pip3 uninstall -y torch torchvision torchaudio
pip3 uninstall -y torch torchvision torchaudio
pip3 cache purge
  1. Install nccl (Nvidia Collective Communications lib) for CUDA 12.4. Basically, its NCCL 2.20.5 which was released on March 5th, 2024. You can find it on the Nvidia website as follows: https://developer.nvidia.com/nccl/nccl-download. Run the commands for the Network Install.
  2. Next, you'll need to install Nvidia cuDNN. Even if you think you have it, do the steps again. You can go to Nvidia's cuDNN download page for instructions.
  3. Finally, the last but most important step is to reinstall PyTorch. Except use the nightly build so that we get the latest version:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

At the time of writing, I am running on CUDA 12.4 with PyTorch working now. Here's what it might look like:

import torch
import torchvision
import torchaudio
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)
print(torch.cuda.is_available())

Output:

2.4.0.dev20240326+cu121
0.19.0.dev20240327+cu121
2.2.0.dev20240327+cu121
True

Wishing everyone the best! And hopefully PyTorch would provide a stable version for CUDA 12.4 users. Happy coding.

Thank you so much! I'm using CUDA 12.4 and this work for me perfectly :)

@d12
Copy link

d12 commented Apr 9, 2024

I was seeing GET was unable to find an engine to execute this computation and RuntimeError: CUDA error: an illegal instruction was encountered errors, it only happened on some workloads but I think it was ultimately due to CUDA / pytorch incompatibilities. I upgraded to CUDA 12.4 and followed @dominicklee 's instructions (#104417 (comment)) to upgrade pytorch, everything is working as expected now 🎉

@Prk0612
Copy link

Prk0612 commented Apr 11, 2024

OSError: [WinError 126] The specified module could not be found............torch\lib\shm.dll" or one of its dependencies.

Getting this error after your steps @dominicklee

@zhanwenchen
Copy link

For PyTorch 2.2.2, you can use CUDA 12.4 but not cuDNN 9 (8 is fine). There are a few modifications you need to make to the sources (I'm not on my Ubuntu machine, so I will update this post later).

By the way, I never quite finished the magma-cuda123 build. I tried to do 2.7.1, but it turns out I don't know enough about MAGMA to know what to monkey-patch.

@johnnynunez
Copy link
Author

For PyTorch 2.2.2, you can use CUDA 12.4 but not cuDNN 9 (8 is fine). There are a few modifications you need to make to the sources (I'm not on my Ubuntu machine, so I will update this post later).

By the way, I never quite finished the magma-cuda123 build. I tried to do 2.7.1, but it turns out I don't know enough about MAGMA to know what to monkey-patch.

For PyTorch 2.2.2, you can use CUDA 12.4 but not cuDNN 9 (8 is fine). There are a few modifications you need to make to the sources (I'm not on my Ubuntu machine, so I will update this post later).

By the way, I never quite finished the magma-cuda123 build. I tried to do 2.7.1, but it turns out I don't know enough about MAGMA to know what to monkey-patch.

This is really the behavior I have, with cudnn 9 I have not been able to make it work.

@johnnynunez
Copy link
Author

All is working with torch 2.3.0.
12.4 Cudnn 9.1 and Tensorrt 10.0.1

@Geremia
Copy link

Geremia commented May 16, 2024

@johnnynunez

All is working with torch 2.3.0.
12.4 Cudnn 9.1

I get:

Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "12.4")

It knows I have CUDA 12.4, so why can't it find CUDA?

@hayatkhan8660-maker
Copy link

hayatkhan8660-maker commented May 24, 2024

Hi Folks,

I am installing apex for mixed precision training. My machine has CUDA 12.4, I have installed torch=2.3.0 but is pre-compiled with CUDA 12.1.

While installing apex, I get the following error.

Using pip 24.0 from /homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip (python 3.8)
Processing /homes/hayatu/Video-FocalNets/apex
Preparing metadata (pyproject.toml): started
Running command Preparing metadata (pyproject.toml)

torch.version = 2.3.0+cu121

running dist_info
creating /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info
writing /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-modern-metadata-zrfl0g3e/apex.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-zrfl0g3e/apex-0.1.dist-info'
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: packaging>20.6 in /homes/hayatu/.local/lib/python3.8/site-packages (from apex==0.1) (23.2)
Building wheels for collected packages: apex
Building wheel for apex (pyproject.toml): started
Running command Building wheel for apex (pyproject.toml)

torch.version = 2.3.0+cu121

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
from /homes/hayatu/miniconda3/envs/focal/bin

Traceback (most recent call last):
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/setuptools/build_meta.py", line 410, in build_wheel
return self._build_with_temp_dir(
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
self.run_setup()
File "/homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 178, in
File "", line 40, in check_cuda_torch_binary_vs_bare_metal
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries. Pytorch binaries were compiled with Cuda 12.1.
In some cases, a minor-version mismatch will not cause later errors: NVIDIA/apex#323 (comment). You can try commenting out this check (at your own risk).
error: subprocess-exited-with-error

× Building wheel for apex (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /homes/hayatu/miniconda3/envs/focal/bin/python3 /homes/hayatu/miniconda3/envs/focal/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpyqf4apm5
cwd: /homes/hayatu/Video-FocalNets/apex
Building wheel for apex (pyproject.toml): finished with status 'error'
ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects

Will truly appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests