Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CUDA 12 #13637

Closed
hawkinsp opened this issue Dec 13, 2022 · 35 comments · Fixed by #15278
Closed

Add support for CUDA 12 #13637

hawkinsp opened this issue Dec 13, 2022 · 35 comments · Fixed by #15278
Assignees
Labels
enhancement New feature or request NVIDIA GPU Issues specific to NVIDIA GPUs

Comments

@hawkinsp
Copy link
Member

There are three pieces we need before we can release jaxlib wheels for CUDA 12:

  • NVIDIA need to release a version of CuDNN for CUDA 12. The current CuDNN release may work if used with CUDA 12 but there are no guarantees and we should not release a wheel until there's a supported path.
  • Upstream XLA needs to be updated to support CUDA 12. [NVIDIA TF] Support building against CUDA 12.0 tensorflow/tensorflow#58867 should achieve this
  • We need to fix any JAX-specific problems that arise under CUDA 12. It is to be determined what these are or if there are any!
@hawkinsp hawkinsp added enhancement New feature or request NVIDIA GPU Issues specific to NVIDIA GPUs labels Dec 13, 2022
@GrzegorzWarzecha
Copy link

@hawkinsp XLA was updated to CUDA 12 few days ago.

Is there any chance to start working with support to CUDA 12?
Thank you!

@hawkinsp
Copy link
Member Author

@GrzegorzWarzecha The current blocker is that NVIDIA has not released a version of CUDNN that works with CUDA 12. Until they do that there's not much we can do!

@mjsML
Copy link
Collaborator

mjsML commented Jan 23, 2023

@nouiz for viz

@zbyso23
Copy link

zbyso23 commented Feb 6, 2023

Why NVIDIA cannot release CUDA with compatible CuDNN? I'm really surpirsed how bad is support from NVIDIA - I buy GPU mainly for AI, but this problems are Hell on Earth. When I try tensorflow-gpu on Mac Mini M1 be a no problem - 5 minutes and done - everything works, but on PC still any problems with version compatibility etc.

@johnnynunez
Copy link

Why NVIDIA cannot release CUDA with compatible CuDNN? I'm really surpirsed how bad is support from NVIDIA - I buy GPU mainly for AI, but this problems are Hell on Earth. When I try tensorflow-gpu on Mac Mini M1 be a no problem - 5 minutes and done - everything works, but on PC still any problems with version compatibility etc.

wait! The release is coming in the following weeks!

@yhtang
Copy link
Collaborator

yhtang commented Feb 6, 2023

CUDNN 8.8 GA for CUDA 12 is scheduled for Feb 2023, so it's almost around the corner. 😃

@zbyso23
Copy link

zbyso23 commented Feb 6, 2023

Oh, I gave last weekend for preparing system dedicated for GPU machine learning, but I falsely hoped about CUDA and CuDNN it's released together 🫣

@1303d
Copy link

1303d commented Feb 8, 2023

when cudNN is rolling out for CUDA12

@johnnynunez
Copy link

when cudNN is rolling out for CUDA12

this month

@1303d
Copy link

1303d commented Feb 8, 2023

How to run my U-net model in GPU , And i have GT1650 TI with i510300H processor,
could someone help me out , coz im currently doing project on ML

@johnnynunez
Copy link

johnnynunez commented Feb 8, 2023

@hawkinsp image
is out

@mjsML
Copy link
Collaborator

mjsML commented Feb 9, 2023

It's out now.

@zbyso23
Copy link

zbyso23 commented Feb 9, 2023

Great!

@hawkinsp
Copy link
Member Author

It looks like all JAX tests pass under CUDA 12, provided NCCL is updated to 2.16, which is newer than the default version JAX uses.

If you want to try building a jaxlib on CUDA 12 yourself, you need to do two things:

  • apply this patch to remove Kepler support:
diff --git a/.bazelrc b/.bazelrc
index 0c291fa5e..fdff4ec93 100644
--- a/.bazelrc
+++ b/.bazelrc
@@ -62,7 +62,7 @@ build:mkl_open_source_only --define=tensorflow_mkldnn_contraction_kernel=1
 build:cuda --repo_env TF_NEED_CUDA=1
 # "sm" means we emit only cubin, which is forward compatible within a GPU generation.
 # "compute" means we emit both cubin and PTX, which is larger but also forward compatible to future GPU generations.
-build:cuda --action_env TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_52,sm_60,sm_70,compute_80"
+build:cuda --action_env TF_CUDA_COMPUTE_CAPABILITIES="sm_52,sm_60,sm_70,compute_80"
 build:cuda --crosstool_top=@local_config_cuda//crosstool:toolchain
 build:cuda --@local_config_cuda//:enable_cuda
 build:cuda --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=true
  • install NCCL 2.16, including its development libraries, then build your own jaxlib with, e.g.:
TF_NCCL_VERSION=2.16.5 python build/build.py --enable_cuda

where 2.16.5 is the NCCL version you have.

We'll work on getting this checked in.

@Nazgul773
Copy link

@hawkinsp image is out

hi, where can i download cudnn 8.8? cant find the version online

@johnnynunez
Copy link

@hawkinsp image is out

hi, where can i download cudnn 8.8? cant find the version online

you have to register in nvidia page.
https://developer.nvidia.com/rdp/cudnn-download

@Nazgul773
Copy link

Nazgul773 commented Feb 13, 2023

thank you!!! i already am, but if u google cudnn 8.8, u dont find anything.
on this website: https://developer.nvidia.com/rdp/cudnn-archive

@johnnynunez
Copy link

thank you!!! i already am, but if u google cudnn 8.8, u dont find anything. on this website: https://developer.nvidia.com/rdp/cudnn-archive

it's still not uploaded in archives

@Nazgul773
Copy link

Nazgul773 commented Feb 13, 2023

Does the tensorflow gpu compatibility already work for cuda 12 and cudnn 8.8? because my rtx 4070ti still doesnt get detected :/
checked it here: https://prnt.sc/poR1xhwblXf0

@johnnynunez
Copy link

johnnynunez commented Feb 13, 2023

Does the tensorflow gpu compatibility already work for cuda 12 and cudnn 8.8? because my rtx 4070ti still doesnt get detected :/ checked it here: https://prnt.sc/poR1xhwblXf0

you have to wait to tensorflow 2.12 or use nightly versions

@Nazgul773
Copy link

what do you mean by nightly versions?

@johnnynunez
Copy link

what do you mean by nightly versions?

https://pypi.org/project/tf-nightly/

@Nazgul773
Copy link

i still dont get an 1 as output, i dont get it.
https://prnt.sc/w3-lc_XIfC2K

@hawkinsp
Copy link
Member Author

This is not the right place to discuss TensorFlow. Please keep discussions in this project related to JAX.

@chrisflesher
Copy link
Contributor

Anyone know when jaxlib wheels are expected to be available for CUDA 12?

@johnnynunez
Copy link

Anyone know when jaxlib wheels are expected to be available for CUDA 12?

I think that when the cudnn is uploaded to archive page

copybara-service bot pushed a commit to google/tsl that referenced this issue Feb 14, 2023
…ntime stubs.

Fixes NCCL test failures in JAX test suite.

Issue google/jax#13637

PiperOrigin-RevId: 509597231
copybara-service bot pushed a commit to google/tsl that referenced this issue Feb 14, 2023
…ntime stubs.

Fixes NCCL test failures in JAX test suite.

Issue google/jax#13637

PiperOrigin-RevId: 509610727
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Feb 14, 2023
…ntime stubs.

Fixes NCCL test failures in JAX test suite.

Issue google/jax#13637

PiperOrigin-RevId: 509610727
copybara-service bot pushed a commit to openxla/xla that referenced this issue Feb 14, 2023
…ntime stubs.

Fixes NCCL test failures in JAX test suite.

Issue google/jax#13637

PiperOrigin-RevId: 509610727
@johnnynunez
Copy link

Cuda 12.1 is out

@hawkinsp
Copy link
Member Author

hawkinsp commented Mar 1, 2023

Note that JAX should build/work fine with CUDA 12, we just need to figure out our CI/wheel release process. If you need a CUDA 12 wheel right now, you can build it from source.

@cottrell
Copy link
Contributor

If this didn't or doesn't work why do the docs say "or newer"?

@hawkinsp
Copy link
Member Author

The docs should be reworded; they were written when CUDA 11 was the newest version.

That said, the next release of JAX will have CUDA 12 wheels.

@cottrell
Copy link
Contributor

The docs should be reworded; they were written when CUDA 11 was the newest version.

That said, the next release of JAX will have CUDA 12 wheels.

It's an interesting problem. Feels like github and LLM might be on the verge of detecting and highlighting these kind of things with one of their bug bots. It is somewhat non-trivial for the reader to find last timestamp of that edit, then find the state of versions available at that time and then make the call.

@hawkinsp
Copy link
Member Author

hawkinsp commented Mar 28, 2023

I haven't yet updated the installation instructions, but we released jax and jaxlib 0.4.7 with CUDA 12 support. Try:

# Installs CUDA and CUDNN using pip, needs *nothing* but the NVIDIA driver
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

or

# You must have CUDA 12.0+ and CUDNN 8.8+ installed already.
pip install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Installation instructions in the README coming soon.

@terafo
Copy link

terafo commented Mar 28, 2023

@hawkinsp I installed it with both options and am getting following error when trying to execute

jax.zeros((5,))

2023-03-28 11:40:16.147614: W external/xla/xla/stream_executor/cuda/cuda_dnn.cc:397] There was an error before creating cudnn handle: cudaErrorNotSupported : operation not supported

@hawkinsp
Copy link
Member Author

@terafo Can you open a new bug with details? In particular, the nvidia-smi output is probably important, together with the versions of any CUDA/CUDNN packages you have installed.

@terafo
Copy link

terafo commented Mar 28, 2023

@hawkinsp sure, just did it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request NVIDIA GPU Issues specific to NVIDIA GPUs
Projects
None yet
Development

Successfully merging a pull request may close this issue.