Add support for other GPUs (than NVIDIA) #2012

ricardobarroslourenco · 2020-01-16T10:59:41Z

Is it possible to run JAX on other GPU architectures other than NVIDIA (ex.: Intel, AMD)?

hawkinsp · 2020-01-16T13:50:56Z

In principle, sure! All we need is XLA to support that architecture.

In practice that means we support at the moment: CPU, NVidia GPU, and TPU.

Happily AMD has been contributing support for AMD GPUs to XLA. We haven't tried it out in JAX, but assuming the XLA support is complete, I see no good reason it wouldn't work with a few small JAX changes. If you are excited about AMD GPUs, we'd certainly welcome contributions enabling that functionality in JAX.

I don't think Intel GPUs have XLA support at the moment, but I wouldn't rule it out in the future as the various compiler toolchains (e.g., XLA, MLIR) progress.

jekbradbury · 2020-01-17T02:07:46Z

The AMDGPU backend for XLA is being actively developed; these PRs probably have the most up-to-date status (seems like many but not all tests pass?)

One thing to note is that the AMD integrations require that you rebuild XLA from source; there's no way to build a single TF or XLA binary that can use both NVIDIA CUDA and AMD ROCm.

For Intel hardware, I imagine we'd need something like MLIR translation from HLO dialect to nGraph dialect. I'm guessing nobody is actively working on that, but ccing @nmostafa in case Intel has plans in that area.

EelcoHoogendoorn · 2020-03-04T10:12:40Z

Glad to see this ROCm thing seems to be funded with fulltime developers by AMD. Better late than never I suppose. I hope they learned at least a little from their misadventures in GPGPU, with opencl being half-assedly supported; and in practice if you wanted to get anything done, you had no choice to go with the platform that didnt require you to say, reinvent your FFT libraries from scratch. I hope this time around they realize there is some minimum investment in software theyd be smart to make, if they want to offer a competitive ecosystem. Its crazy to see how much money nvidia has made off this; in the meanwhile google adds a completely new viable hardware and software alternative in the forms of TPUs; and AMD is still working on getting compatibility with any of the software out there. It does not inspire much confidence to be honest; it seems wise to bet against them ever getting out a robust feature complete alternative, if they couldnt even get anything out 4 years ago already. But id love to be wrong about this, and for there to be some genuine competition in desktop ML acceleration in the future.

Cvikli · 2020-06-08T15:17:28Z

Can someone help me how to use Jax on AMD GPUs? Are there any code snippets we can start with?

Sixzero · 2020-06-26T11:59:51Z

Any update on the topic?
How can that happen tensorflow supports AMD GPU-s but JAX doesn't?
Isn't ROCM is the CUDA for AMD GPU-s and inplace replacements of each others?

hawkinsp · 2020-06-26T14:24:23Z

There's no technical blocker to using JAX on AMD GPUs. We on the JAX team simply don't have access to any AMD GPUs at the moment to develop or test the necessary changes (which are probably not that large, given most of the necessary work has been done in the context of TensorFlow.)

Contributions are welcome!

8bitmp3 · 2020-06-26T18:16:04Z

The AMDGPU backend for XLA is being actively developed

That's good to know @jekbradbury, thanks

akuz · 2020-08-28T20:46:11Z

I just wanted to ask, when we are taking about AMD GPUs being supported, is it going to be on all platforms (i.e. including MacOS) or are we talking Linux/Windows only?

jekbradbury · 2020-09-13T04:46:39Z

I believe the AMDGPU backend support for XLA is based on ROCm, which doesn't support macOS.

inailuig · 2020-11-22T20:41:10Z

I was able to build jax with initial support for ROCm (AMD GPUs) ~~by compiling it using XLA from ROCmSoftwarePlatform/tensorflow-upstream~~ (update: after tensorflow/tensorflow#45344 you can use upstream TF) and adding a few options to the build scripts.

The code can be found here: ~~inailuig/jax~~ (update: after #5114 you can use upstream jax)

Executing

import jax
print(jax.devices())
print(jax.devices()[0].device_kind)
x = jax.numpy.array([1.2, 3.4, 5.6])
y = jax.numpy.exp(x)
print(y)

on my RX480 outputs

[GpuDevice(id=0)]
Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
'+code-object-v3' is not a recognized feature for this target (ignoring feature)
'+code-object-v3' is not a recognized feature for this target (ignoring feature)
'+code-object-v3' is not a recognized feature for this target (ignoring feature)
[  3.3201168  29.964104  270.4264   ]
2020-11-22 20:40:04.841794: E external/org_tensorflow/tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:614] Deallocating stream with pending work
2020-11-22 20:40:04.842168: E external/org_tensorflow/tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:614] Deallocating stream with pending work
2020-11-22 20:40:04.842517: E external/org_tensorflow/tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:614] Deallocating stream with pending work
2020-11-22 20:40:04.842866: E external/org_tensorflow/tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:614] Deallocating stream with pending work
2020-11-22 20:40:04.844206: E external/org_tensorflow/tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:614] Deallocating stream with pending work

which already looks very promising.
However there are still things missing such as the custom gpu kernels in jaxlib (cublas, cuda_prng, cusolver).

For those who want to build this:
I am running Ubuntu 20.04.1 with rocm 3.9.0 installed using the official instructions.
Also it is necessary to install these additional packages:
rocm-dev miopen-hip rocfft rocblas rccl hipsparse rocrand rocsolver hipblas
Then the whole thing can be built with
python3 build/build.py --enable_rocm --rocm_path /opt/rocm-3.9.0
Optionally different amdgpu targets can be specified with --rocm_amdgpu_targets (see here). For now I put in some default targets, however autodetection does also work (by passing "" (an empty string) which overrides the default).

hawkinsp · 2020-12-01T00:34:56Z

@inailuig That's exciting progress! Nice work! (Sorry for the slow response, many of us were on vacation this last week.)

Technically speaking the cublas/cusolver and cuda_prng kernels are somewhat optional. The cuda_prng kernel is a compile-time optimization and can be safely omitted (at the cost of increased compile time), and cublas/cusolver are only needed for linear algebra support. So it might be possible to check things in even before those pieces work.

I'm curious: is it possible to use upstream TF instead of the ROCm fork? We frequently update our TF (XLA) version, so any ROCm specific fork is likely to be stale.

inailuig · 2020-12-01T22:31:10Z

@hawkinsp Turns out all that is missing in upstream TF is actually looking for devices with the right platform i.e. some changes in tensorflow/compiler/xla/pjrt/nvidia_gpu_device.cc (from this commit: ROCm/tensorflow-upstream@0ba0236)

diff --git a/tensorflow/compiler/xla/pjrt/nvidia_gpu_device.cc b/tensorflow/compiler/xla/pjrt/nvidia_gpu_device.cc
index 4863e5e8165..870007f1dca 100644
--- a/tensorflow/compiler/xla/pjrt/nvidia_gpu_device.cc
+++ b/tensorflow/compiler/xla/pjrt/nvidia_gpu_device.cc
@@ -57,11 +57,19 @@ xla::StatusOr<xla::DeviceAssignment> GpuClient::GetDefaultDeviceAssignment(
 
 // Builds an xla::LocalClient for the GPU platform.
 StatusOr<LocalClient*> GetGpuXlaClient() {
+#if GOOGLE_CUDA
   TF_ASSIGN_OR_RETURN(se::Platform * platform,
                       PlatformUtil::GetPlatform("CUDA"));
   if (platform->VisibleDeviceCount() <= 0) {
     return FailedPrecondition("No visible NVidia GPU devices.");
   }
+#else
+  TF_ASSIGN_OR_RETURN(se::Platform * platform,
+                      PlatformUtil::GetPlatform("ROCm"));
+  if (platform->VisibleDeviceCount() <= 0) {
+    return FailedPrecondition("No visible AMD GPU devices.");
+  }
+#endif
   LocalClientOptions options;
   options.set_platform(platform);
   return ClientLibrary::GetOrCreateLocalClient(options);

Do you think we could get something like that upstreamed into TF ?

For cuda_prng and the cublas/cusolver kernels I was also able to get them running (2 or 3 of the lapack functions (cusolver) are not yet implemented in rocsolver, but everything else is there; also requires a few more changes to TF; I will post more once I cleaned it up a bit)

hawkinsp · 2020-12-01T22:36:37Z

We certainly can upstream something like that. That file is really part of JAX so we can change it as we see fit. You can send PRs to TensorFlow and assign me; I can review.

deven-amd · 2020-12-04T01:59:56Z

@hawkinsp @inailuig

Thank you for trying out JAX on AMD GPUs. I am on the TF framework team in AMD, and would like to get a better understanding of the TF changes that are required to get JAX working. We would be more than happy to help out.

I also had a question for you. Does JAX have unit-tests that run on GPUs, and if so can you point me to the directions to run them. I would like to get them running on internally on our platform,

thanks again

deven

hawkinsp · 2020-12-04T17:00:42Z

@deven-amd We'll need to wait for @inailuig to send out their remaining changes to get things to build.

Once those changes are checked in, the best way to do this is probably something like this:

git clone https://github.com/google/jax.git
git clone https://github.com/tensorflow/tensorflow.git /mydir/tensorfow
cd jax
python build/build.py --bazel_options=--override_repository=org_tensorflow=/mydir/tensorflow --enable_rocm
pip install dist/*.whl
pip install -e .
XLA_PYTHON_CLIENT_ALLOCATOR=platform pytest -n auto tests examples

This builds and installs jaxlib with TF (XLA) from head (rather than whatever version we have pinned in our WORKSPACE file). (You can also achieve this by editing the WORKSPACE file; see the comments in that file.)

The XLA_PYTHON_CLIENT_ALLOCATOR avoids using the BFC allocator which preallocates GPU memory, which means that we should be able to run tests in parallel using multiple processes (-n auto enables this).

I should note there are probably a few tests that fail at head on Nvidia GPUs also (#5067).

hawkinsp · 2020-12-04T17:01:44Z

See also https://jax.readthedocs.io/en/latest/developer.html#running-the-tests

deven-amd · 2020-12-04T17:23:48Z

Hi Peter, Thanks for the quick response. I will try out the directions you have provided + the docs, to get the JAX unit tests working on the ROCm platform. I expect to work on this next week, will ping you if I run into any issues. In case I do, would you rather I email you directly or file an issue on the JAX github repo? Thanks deven

…

On Fri, Dec 4, 2020 at 12:10 PM Peter Hawkins ***@***.***> wrote: See also https://jax.readthedocs.io/en/latest/developer.html#running-the-tests — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2012 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIZGTXBBS2FMGNUOINZHVXTSTEJYHANCNFSM4KHSBE2Q> .

hawkinsp · 2020-12-04T17:28:17Z

@deven-amd

If there's no reason otherwise, we like to do development in the open so the community can be involved. So I'd file issues/PRs or use Github discussions. You can ping me in any issues or PRs if you want to make sure I take a look!

inailuig · 2020-12-05T19:27:50Z

@deven-amd Thanks for reaching out, would be great if you could in particular help with fixing the tests which are still failing.

I just opened #5114 for the remaining build related stuff in jax.
In general things seem to be working.

However there are still some tests failing because of bugs (e.g. stuff related to conv, dot_general, triangular solve, ...)
Other Features are simply not implemented yet for ROCm in XLA (e.g. TRSM for complex args).
For the latter we will have to identify and skip them.

Also there is this error message

 E external/org_tensorflow/tensorflow/stream_executor/rocm/rocm_gpu_executor.cc:614] Deallocating stream with pending work

which keeps popping up when the program terminates. @deven-amd would you be able to look into this?

For the BLAS/LAPACK wrappers (i.e. jaxlib/cusolver.py and the related pybind modules but for rocm)
I mostly followed what @hawkinsp did for cuda here since its just lots of glue code around roc/cu BLAS/Solver routines). This can be found in #5115

For this to work we still need a few changes in TF:

custom_call_thunk needs to be enabled and build for rocm: https://github.com/inailuig/tensorflow/commit/44d3a233c6971344d595aacbad1459e1822264cd
in xla_client.py "CUDA" is hardcoded when you try to register a custom call target
I suggest we fix this like so:https://github.com/inailuig/tensorflow/commit/be9602a7666eb05edc33faae7825f8401968e885
(This still keeps CUDA as default when you pass 'gpu' unfortunately)
Then we can register functions for ROCM like this:
xla_client.register_custom_call_target(_name, _value, platform="ROCM")
Everywhere else in jax we can keep 'gpu'.
We need to add rocSolver targets to the build scripts somewhere (I think we should add this to TF, although I guess it would also be possibe to add them just to jax)
For my attempt at this see: https://github.com/inailuig/tensorflow/commit/606d7933b39f4115f8aea61e25bceb906855b5bf
Not strictly necessary but nice to have: rocm_library, see https://github.com/inailuig/tensorflow/commit/e08f34ca8fe49056407eeaa706556af891d6857d

All of this can be found in https://github.com/inailuig/tensorflow/tree/jax-rocm-gpukernels (there are 2 more commits which are useful for debugging, but not necessary)

@hawkinsp How should we proceed?

hawkinsp · 2020-12-06T17:27:35Z

Seems fine: I'd send that as a PR.
Also looks fine to me. I might be tempted to change "gpu" to mean "register both CUDA and ROCM", which we could do by making xla_platform_names a dictionary whose values are a list of names and then register all of them.
Seems plausible, and adding it to TF is probably the better place (that way, TF can share the build rules). I'm a bit surprised that TF doesn't have ROCSolver hooked up already.
Also seems reasonable to me, but I'm not as sure about this.

hawkinsp · 2020-12-07T14:39:20Z

Retitling this bug to focus on AMD GPUs only; we can open new bugs for other hardware vendors if needed.

coversb · 2022-01-27T01:48:39Z

Hi team,

Thanks a lot for support ROCm for jax. Now I have met some issues:

I don't knwo which is the right way to build jax from source (I saw https://hub.docker.com/r/rocm/jax and do checkout branch jax_preview_release)
I build with rocm4.0.1, and the device is gfx906.

1.Run unittest(https://jax.readthedocs.io/en/latest/developer.html?highlight=pytest#running-the-tests)

python tests/lax_numpy_test.py --num_generated_cases=5

it shows

2022-01-27 09:21:25.249401: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/buffer_allocations.cc:78] Check failed: buffer_slice.offset() + buffer_slice.size() <= base.size() (4 vs. 0)

If I run the program with jaxlib, it will also show this assertion. I think maybe the source code or build way I used is wrong?

2.Try to use rocm4.1 to build jax from source, but it failed in mlir/xla/operator_writer_gen part, I don't known how to get a right llvm tar package

bazel-out/k8-opt-exec-50AE0418/bin/external/org_tensorflow/tensorflow/compiler/mlir/xla/operator_writer_gen: symbol lookup error: bazel-out/k8-opt-exec-50AE0418/bin/external/org_tensorflow/tensorflow/compiler/mlir/xla/operator_writer_gen: undefined symbol: _ZTINSt3_V214error_categoryE

3.When I setup XLA_PYTHON_CLIENT_MEM_FRACTION or XLA_PYTHON_CLIENT_PREALLOCATE, seems don't pre-alloc gpu memory as the FRACTION set, just malloc 2%-14% GPU RAM

Can you help me? Thanks a lot!!!

coversb · 2022-01-29T00:56:42Z

Hi team,

Thanks a lot for support ROCm for jax. Now I have met some issues:

I don't knwo which is the right way to build jax from source (I saw https://hub.docker.com/r/rocm/jax and do checkout branch jax_preview_release) I build with rocm4.0.1, and the device is gfx906.

1.Run unittest(https://jax.readthedocs.io/en/latest/developer.html?highlight=pytest#running-the-tests)
python tests/lax_numpy_test.py --num_generated_cases=5
it shows
2022-01-27 09:21:25.249401: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/buffer_allocations.cc:78] Check failed: buffer_slice.offset() + buffer_slice.size() <= base.size() (4 vs. 0)
If I run the program with jaxlib, it will also show this assertion. I think maybe the source code or build way I used is wrong?

2.Try to use rocm4.1 to build jax from source, but it failed in mlir/xla/operator_writer_gen part, I don't known how to get a right llvm tar package
bazel-out/k8-opt-exec-50AE0418/bin/external/org_tensorflow/tensorflow/compiler/mlir/xla/operator_writer_gen: symbol lookup error: bazel-out/k8-opt-exec-50AE0418/bin/external/org_tensorflow/tensorflow/compiler/mlir/xla/operator_writer_gen: undefined symbol: _ZTINSt3_V214error_categoryE
3.When I setup XLA_PYTHON_CLIENT_MEM_FRACTION or XLA_PYTHON_CLIENT_PREALLOCATE, seems don't pre-alloc gpu memory as the FRACTION set, just malloc 2%-14% GPU RAM

Can you help me? Thanks a lot!!!

Fixed 2, it's libstdc++ version problems, but still have 'Check failed' error

reza-amd · 2022-02-07T18:57:49Z

@coversb, is it possible for you to upgrade your ROCm version?

coversb · 2022-02-14T01:56:49Z

@coversb, is it possible for you to upgrade your ROCm version?
@reza-amd
Thanks for your reply! Yes, which version do you think is OK? I don't know if there are some incompatible features between higher version and ROCm4.0 , seems lots of header files and lib files changed.

brettkoonce · 2022-02-16T03:39:09Z

@reza-amd is is possible to get a 5.0 series build (drun image + jax)? is there anything similar for pytorch? thanks in advance!

reza-amd · 2022-03-02T23:28:00Z

Sorry for my slow response.
We have recently released ROCm-5.0 and we have updated JAX accordingly.
You can track the status of PR here: #9584
In the PR source branch, I have provided utility scripts to build a ROCm container with JAX. Please take a look at https://github.com/ROCmSoftwarePlatform/jax/tree/rocm_refactor_jaxlib/build/rocm for more details.

brettkoonce · 2022-03-03T21:17:57Z

@reza-amd Thanks for the update! I will try test things as soon as possible. More broadly, what are the criteria to close this bug? Things seem to be working reasonably well!

brettkoonce · 2022-03-14T18:47:15Z

See also: #9864

brettkoonce · 2022-03-20T15:53:41Z

@reza-amd Thank you again for the help getting docker working! I am able to use jax to build a docker image and then train networks locally. I did a benchmark using flax + resnet50 + imagenet with a batchsize of 256 in fp16 mode.

Here are the results of a wx6800:

I0320 11:30:18.089150 140318779365120 logging_writer.py:35] [500400] steps_per_second=1.096407, train_accuracy=0.8140624761581421, train_learning_rate=3.6358835941996404e-09, train_loss=0.7592905163764954, train_scale=65536.0
I0320 11:30:53.003456 140385056196416 train.py:364] eval epoch: 99, loss: 0.9520, accuracy: 76.26
I0320 11:30:53.004454 140318779365120 logging_writer.py:35] [500400] eval_accuracy=0.7626402378082275, eval_loss=0.9520021080970764

Here is the results of the same code (eg fp16+bs256) on a dual nvidia 3060 (cuda) setup:

I0319 08:20:43.101390 140559431702272 logging_writer.py:35] [500400] steps_per_second=3.186386, train_accuracy=0.8107030987739563, train_learning_rate=3.6358835941996404e-09, train_loss=0.7701781988143921, train_scale=65536.0
I0319 08:20:57.018532 140610530277184 train.py:364] eval epoch: 99, loss: 0.9472, accuracy: 76.45
I0319 08:20:57.019293 140559431702272 logging_writer.py:35] [500400] eval_accuracy=0.7645031809806824, eval_loss=0.9471861124038696

What else would be needed to mark this bug as resolved? I will start a ViT run next, but that will take a few days to complete!

reza-amd · 2022-03-23T19:50:09Z

@brettkoonce Thanks much for your update and testing our recent changes in ROCm-5.0.

brettkoonce · 2022-05-04T16:43:53Z

@reza-amd I have made a little bit of progress with ViT and am having some issues with numerical precision on the w6800. The wx6800 is able to train models using a batchsize of 128 but I get reduced accuracy compared to a reference run on a TPU.

w6800 results:

I0415 08:36:18.115095 139965291800320 logging_writer.py:35] [900810] valid_loss=3.451379, valid_prec@1=0.423600

TPU-v2, batch size of 128 (all other code identical):

I0504 16:20:31.839697 140592298116864 logging_writer.py:35] [900810] valid_loss=3.074142, valid_prec@1=0.491300

Second run with different TPU, same code config:

I0504 16:20:37.937061 140423741003520 logging_writer.py:35] [900810] valid_loss=3.013428, valid_prec@1=0.498340

I had similar results (eg lower performance on AMD) when I did my tests with 4.5.0 last year. Do you have any ideas on why this would happen/suggestions for how to improve things?

hawkinsp · 2022-05-04T17:50:53Z

@brettkoonce When comparing against TPU, a key thing to be careful of is that the default matmul and convolution precision on TPU is bfloat16 inputs with float32 accumulation. Try setting jax_default_matmul_precision to float32, which although slower should give numerics closer to typical GPUs. Just because the AMD GPU loss is worse, doesn't mean that it's necessarily that the AMD GPU implementation is doing something wrong. (It might be! But I'd try to rule out known quantities.)

brettkoonce · 2022-05-04T19:14:46Z

@hawkinsp I am using the scenic vit demo in float32 mode (set like so for data / model), for what it's worth. Are there additional settings I should investigate?

I am doing a nvidia run currently with the same configuration and will report when that is finished.

brettkoonce · 2022-05-18T13:28:51Z

Here is the result when using Nvidia hardware (4x3060) with the same configuration:

I0518 07:44:17.425569 140335808247552 logging_writer.py:35] [900810] valid_loss=3.027929, valid_prec@1=0.495140

hawkinsp · 2022-05-18T15:02:58Z

@brettkoonce Perhaps move this to a new bug? But my suggestion would be: can you minimize it to a small self-contained test case? That's what I would do, if I had access to the hardware and were debugging it. You might consider comparing the results of a single training step between CPU and GPU, or between the two GPUs.

brettkoonce · 2022-06-26T15:15:32Z

ROCm build scripts have been failing for ~2 months, see #10162.

brettkoonce · 2022-08-16T18:43:08Z

Jax 04b751c is building with rocm 5.2!

brettkoonce · 2023-01-25T01:27:47Z

Jax 09794be is building with rocm 5.4!

stephensrmmartin · 2023-06-11T21:17:04Z

Hey @brettkoonce

I am trying to compile jax with jaxlib for rocm on arch linux, and just cannot get a functional combination of things to work.

I was able to compile jaxlib 4.6 and 4.9, but errors occurred at runtime (including seg faults).

Are you able to share which commits/releases/tags you used for jax/jaxlib, xla, (tensorflow if you still used that repo), and which build options you used?

ricardobarroslourenco · 2023-06-22T20:54:35Z

After some time pinging back on this issue, what an excellent discussion. Is anyone lucky enough to run JAX on ARM architecture (such as the Apple Silicon processors)?

hawkinsp · 2023-06-22T21:03:53Z

@ricardobarroslourenco Yes. JAX has supported CPU-only execution on Apple hardware for many releases, and there is a new and experimental Apple GPU plugin (https://github.com/google/jax#pip-installation-apple-gpus). (Note: experimental).

In fact, I think I'm going to declare this issue fixed, because at this point we now have at my last count four GPU vendors (NVIDIA, AMD, Apple, Intel) that support JAX to some degree, so I think we can say "we support multiple GPU vendors". We're working on better integration, better testing, and easier release processes for all of them.

Feel free to file new bugs specific to particular hardware vendors!

JoeyTeng · 2023-06-23T11:04:22Z

Just a quick comment, will it be better to mention the installation guide for ROCm devices in the README, right before the Apple Metal devices section? What do you think @hawkinsp @brettkoonce ?

brettkoonce · 2023-06-25T15:38:49Z

Grabbag of responses:

@hawkinsp +1 closing this as well, glad to have helped!

@stephensrmmartin

Are you able to share which commits/releases/tags you used for jax/jaxlib, xla, (tensorflow if you still used that repo), and which build options you used?

The pattern I have had luck with (ROCm 4.5 and up) is:

Latest Ubuntu linux LTS (supported by ROCm) with HKE addon.
Full ROCm install using the installer.
Install Docker with hardware extensions enabled -->
Then build rocm + jax inside said container, able to talk to device using the instructions in the AMD rocm guide
You should now be able to run python inside the docker environment, import jax + call jax.devices() to verify things are working together.
(optional) then pin/freeze said image and use it as a base for experiments.

It's not super-turnkey but it definitely works!

@JoeyTeng With the amount of customization ROCm requires, keeping it inside the docker build sub-folder (eg where it's at right now) would be where I would keep it going forward. The jax part works fine but ROCm needs more maturity in general before I can recommend it to new ML practitioners (eg having it on the primary README).

hawkinsp added the contributions welcome The JAX team has not prioritized work on this. Community contributions are welcome. label Jan 16, 2020

shoyer mentioned this issue Feb 28, 2020

compiled .whl file for AMD GPU #2324

Closed

jchodera mentioned this issue Jul 6, 2020

Accelerate the code, starting with the inner loops choderalab/pymbar#340

Open

inailuig mentioned this issue Dec 2, 2020

Allow for gpu platforms other than CUDA in jax tensorflow/tensorflow#45344

Merged

This was referenced Dec 5, 2020

Add support for AMD GPU's (ROCm) #5114

Merged

Add support for linear algebra ops on ROCm using rocBLAS/rocSolver #5115

Merged

copybara-service bot closed this as completed in #5114 Dec 7, 2020

hawkinsp reopened this Dec 7, 2020

hawkinsp changed the title ~~Using non-nvidia GPU~~ Add support for AMD GPUs Dec 7, 2020

brettkoonce mentioned this issue May 19, 2022

vit imagenet training precision issues with amd w6800 gpu + rocm 5.0 #10761

Closed

osimmac mentioned this issue Jun 13, 2022

DALL-E server doesn't work with AMD GPU saharmor/dalle-playground#49

Closed

brettkoonce mentioned this issue Jun 26, 2022

[ROCm] JAX-ROCm docker images #7598

Open

hawkinsp added the AMD GPU Issues pertaining to AMD GPUs (ROCM) label Aug 15, 2022

brettkoonce mentioned this issue Aug 16, 2022

imagenet demo not working out of the box google/flax#2403

Closed

ricardobarroslourenco changed the title ~~Add support for AMD GPUs~~ Add support for other GPUs (than NVIDIA) Jun 22, 2023

hawkinsp closed this as completed Jun 22, 2023

Add support for other GPUs (than NVIDIA) #2012

Add support for other GPUs (than NVIDIA) #2012

Comments

ricardobarroslourenco commented Jan 16, 2020

hawkinsp commented Jan 16, 2020

jekbradbury commented Jan 17, 2020

EelcoHoogendoorn commented Mar 4, 2020

Cvikli commented Jun 8, 2020

Sixzero commented Jun 26, 2020

hawkinsp commented Jun 26, 2020

8bitmp3 commented Jun 26, 2020

akuz commented Aug 28, 2020

jekbradbury commented Sep 13, 2020

inailuig commented Nov 22, 2020 • edited Loading

hawkinsp commented Dec 1, 2020

inailuig commented Dec 1, 2020

hawkinsp commented Dec 1, 2020

deven-amd commented Dec 4, 2020

hawkinsp commented Dec 4, 2020 • edited Loading

hawkinsp commented Dec 4, 2020

deven-amd commented Dec 4, 2020 via email

hawkinsp commented Dec 4, 2020

inailuig commented Dec 5, 2020 • edited Loading

hawkinsp commented Dec 6, 2020

hawkinsp commented Dec 7, 2020

coversb commented Jan 27, 2022

coversb commented Jan 29, 2022

reza-amd commented Feb 7, 2022

coversb commented Feb 14, 2022

brettkoonce commented Feb 16, 2022

reza-amd commented Mar 2, 2022

brettkoonce commented Mar 3, 2022

brettkoonce commented Mar 14, 2022

brettkoonce commented Mar 20, 2022

reza-amd commented Mar 23, 2022

brettkoonce commented May 4, 2022

hawkinsp commented May 4, 2022

brettkoonce commented May 4, 2022 • edited Loading

brettkoonce commented May 18, 2022

hawkinsp commented May 18, 2022

brettkoonce commented Jun 26, 2022

brettkoonce commented Aug 16, 2022

brettkoonce commented Jan 25, 2023

stephensrmmartin commented Jun 11, 2023

ricardobarroslourenco commented Jun 22, 2023

hawkinsp commented Jun 22, 2023

JoeyTeng commented Jun 23, 2023

brettkoonce commented Jun 25, 2023

inailuig commented Nov 22, 2020 •

edited

Loading

hawkinsp commented Dec 4, 2020 •

edited

Loading

inailuig commented Dec 5, 2020 •

edited

Loading

brettkoonce commented May 4, 2022 •

edited

Loading