jax eisum has different results between CPU and GPU #22557

John-zzh · 2024-07-22T09:49:06Z

Description

Hi there,
My jax.numpy.einsum has awful accuracy on GPU device, but seems no problem on CPU.

This is runing on GPU

import jax
import jax.numpy as jnp

A = np.random.rand(300,300)
B = np.random.rand(300,300,4)

numpy_result_double        = np.einsum("ab,caP->cbP", A, B)
numpy_result_single        = np.einsum("ab,caP->cbP", A.astype(np.float32), B.astype(np.float32))

jax_result                 = jnp.einsum("ab,caP->cbP", jnp.array(A), jnp.array(B))

print(np.linalg.norm(numpy_result_double - numpy_result_single))
print(np.linalg.norm(         jax_result - numpy_result_single))

I got this reslt:

0.011850594613419469
0.9229351

but when running on CPU

import numpy as np
import jax
import jax.numpy as jnp
# running on CPU
jax.config.update('jax_platform_name', 'cpu')

A = np.random.rand(300,300)
B = np.random.rand(300,300,4)

numpy_result_double        = np.einsum("ab,caP->cbP", A, B)
numpy_result_single        = np.einsum("ab,caP->cbP", A.astype(np.float32), B.astype(np.float32))

jax_result                 = jnp.einsum("ab,caP->cbP", jnp.array(A), jnp.array(B))

print(np.linalg.norm(numpy_result_double - numpy_result_single))
print(np.linalg.norm(         jax_result - numpy_result_single))

and I got

0.011844848124626908
0.012697785

Both jax 0.4.28 and 0.4.30 have same issue on my machine.
Is it becasue of WSL2 environment or specific installation way of jax?
I used conda install jaxlib=*=*cuda* jax cuda-nvcc -c conda-forge -c nvidia, and jax.print_environment_info() says

System info (python version, jaxlib version, accelerator, etc.)

jax:    0.4.28
jaxlib: 0.4.28.dev20240711
numpy:  1.25.2
python: 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10)  [GCC 12.3.0]
jax.devices (1 total, 1 local): [cuda(id=0)]
process_count: 1
platform: uname_result(system='Linux', node='jojolaptop', release='5.15.153.1-microsoft-standard-WSL2', version='#1 SMP Fri Mar 29 23:14:13 UTC 2024', machine='x86_64')

$ nvidia-smi
Mon Jul 22 17:46:39 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4050 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   46C    P3              9W /   35W |     270MiB /   6141MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     56103      C   /python3.9                                  N/A      |
+-----------------------------------------------------------------------------------------+

I also tried pip install jax[cuda12] and jax.print_environment_info() says

jax:    0.4.30
jaxlib: 0.4.30
numpy:  2.0.1
python: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]
jax.devices (1 total, 1 local): [cuda(id=0)]
process_count: 1
platform: uname_result(system='Linux', node='jojolaptop', release='5.15.153.1-microsoft-standard-WSL2', version='#1 SMP Fri Mar 29 23:14:13 UTC 2024', machine='x86_64')


$ nvidia-smi
Mon Jul 22 18:16:28 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4050 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   51C    P3              9W /   35W |     299MiB /   6141MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     70120      C   /python3.12                                 N/A      |
+-----------------------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

jakevdp · 2024-07-22T15:19:39Z

I suspect this is a matmul precision configuration issue.

JAX's dot-like operations default to a lower precision on some accelerators for performance reasons. If you want to force higher precision, you can do so either via the precision argument to einsum and other dot-like operations

jnp.einsum("ab,caP->cbP", A, B, precision='highest')

or you can modify the value globally using the jax_default_matmul_precision configuration:

jax.config.update('jax_default_matmul_precision', 'highest')

John-zzh · 2024-07-23T08:31:43Z

thx! that works! now GPU is as accurate as CPU.

When should I use jax.config.update('jax_default_matmul_precision', 'highest')?
Two my friends tried my code above without setting 'jax_default_matmul_precision', and they are just fine. I feel like it is system dependent.

jakevdp · 2024-07-23T14:28:50Z

I feel like it is system dependent.

Precisely. Currently the precise meaning of matmul precision settings varies by hardware. For some GPU chips, 'high' and 'highest' are equivalent, while for others they aren't. It's not in a great state currently, and #18934 tracks improving this.

kcdodd · 2024-07-27T00:28:19Z

Having just spent hours debugging why a particular computation was an order of magnitude different in one version of jax/cuda versus another (on the same machine, and both using the same GPU hardware), I was fairly disappointed to find that the culprit was use of einsum and the counter-intuitive consequence of its default precision. My opinion is that whatever is chosen on how to specify a lower precision for matmul in general, the default should match what would be expected from the dtype of the arguments, for ubiquitous operations that are also presented as a replacement for the numpy version.

While I understand the motivation of performance trade-off for some machine learning applications, this makes it questionable to use for more mathematically/scientifically rigorous applications where algorithms depend on certain assumptions about floating point precision and reproducibility.

John-zzh · 2024-07-30T04:51:35Z

Having just spent hours debugging why a particular computation was an order of magnitude different in one version of jax/cuda versus another (on the same machine, and both using the same GPU hardware), I was fairly disappointed to find that the culprit was use of einsum and the counter-intuitive consequence of its default precision. My opinion is that whatever is chosen on how to specify a lower precision for matmul in general, the default should match what would be expected from the dtype of the arguments, for ubiquitous operations that are also presented as a replacement for the numpy version.

While I understand the motivation of performance trade-off for some machine learning applications, this makes it questionable to use for more mathematically/scientifically rigorous applications where algorithms depend on certain assumptions about floating point precision and reproducibility.

thanks for your effort. It does look dangerous to use a different default precision in scientific computing.
So is this bug fixed now? or maybe update in next release?

jakevdp · 2024-08-08T17:22:20Z

I think I'm going to close this issue now, because the root cause is tracked by #18934

John-zzh added the bug Something isn't working label Jul 22, 2024

jakevdp self-assigned this Jul 22, 2024

jakevdp closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jax eisum has different results between CPU and GPU #22557

jax eisum has different results between CPU and GPU #22557

John-zzh commented Jul 22, 2024 •

edited

Loading

jakevdp commented Jul 22, 2024 •

edited

Loading

John-zzh commented Jul 23, 2024

jakevdp commented Jul 23, 2024

kcdodd commented Jul 27, 2024 •

edited

Loading

John-zzh commented Jul 30, 2024

jakevdp commented Aug 8, 2024

jax eisum has different results between CPU and GPU #22557

jax eisum has different results between CPU and GPU #22557

Comments

John-zzh commented Jul 22, 2024 • edited Loading

Description

System info (python version, jaxlib version, accelerator, etc.)

jakevdp commented Jul 22, 2024 • edited Loading

John-zzh commented Jul 23, 2024

jakevdp commented Jul 23, 2024

kcdodd commented Jul 27, 2024 • edited Loading

John-zzh commented Jul 30, 2024

jakevdp commented Aug 8, 2024

John-zzh commented Jul 22, 2024 •

edited

Loading

jakevdp commented Jul 22, 2024 •

edited

Loading

kcdodd commented Jul 27, 2024 •

edited

Loading