CUDA: Numerical differences between CPython / CPU target and the CUDA target in `math.pow` #8218

gmarkall · 2022-07-04T08:53:11Z

Originally noticed on Discourse.

The following:

from numba import cuda, njit
import numpy as np
import math


def inner(x, y):
    return int(math.pow(x, y))


cpu_jitted = njit(inner)


def cuda_jitted(x, y):
    inner_device = cuda.jit(device=True)(inner)

    @cuda.jit
    def outer(r, x, y):
        r[()] = inner_device(x[()], y[()])

    r_arr = np.array(0, dtype=np.int64)
    x_arr = np.array(0, dtype=np.int64)
    y_arr = np.array(0, dtype=np.int64)
    x_arr[()] = x
    y_arr[()] = y
    outer[1, 1](r_arr, x_arr, y_arr)

    return r_arr[()]


x = 3
y = 1

print(inner(x, y))
print(cpu_jitted(x, y))
print(cuda_jitted(x, y))

prints

3
3
2

This can cause some surprise.

The text was updated successfully, but these errors were encountered:

gmarkall · 2022-07-04T09:09:05Z

Similar occurs in CUDA C / C++:

// Compile with:
// nvcc -gencode arch=compute_75,code=sm_75 -std c++17 mini.cu -o mini
//
// produces:
//
// 3
// 2

#include <iostream>
#include <cstdint>

template<typename T>
__global__ void kernel(T* r, T* x, T* y)
{
  r[0] = std::pow(x[0], y[0]);
}

template <typename T>
T host(T x, T y)
{
  return std::pow(x, y);
}

__managed__ uint32_t r_32;
__managed__ uint32_t x_32;
__managed__ uint32_t y_32;

int main(int argc, char **argv)
{
  x_32 = 3;
  y_32 = 1;
  kernel<<<1, 1>>>(&r_32, &x_32, &y_32);
  cudaDeviceSynchronize();
  std::cout << host(x_32, y_32) << std::endl;
  std::cout << r_32 << std::endl;
}

gmarkall · 2022-07-04T10:29:29Z

The CUDA target doesn't end up using int_power_impl, but the CPU target does. This will likely need some adjustments in the typing / lowering for math.pow for CUDA, but because of the mix of implementations (those from the CPU target, impl_pow_int and impl_binary in mathimpl.py) means that it will need a bit of careful thinking about / testing.

gmarkall · 2022-07-04T10:30:03Z

I think I'm going to label this as an actual bug because I think it should be fixable to be less surprising, even though the same occurs in CUDA C/C++.

gmarkall · 2022-07-04T10:31:51Z

IIRC a similar issue affected cuDF: rapidsai/cudf#10178

bdice · 2022-07-05T02:02:33Z

If the result is expected to be integral, it is possible to solve this in the same way as CuPy with an exponentiation-by-squaring algorithm, as I noted in the cuDF issue you linked. I think that might be the best solution for Numba as well as cuDF. rapidsai/cudf#10178 (comment)

gmarkall · 2022-07-05T08:11:00Z

Thanks @bdice - this is indeed what should be happening - it should be picking up this implementation: https://github.com/numba/numba/blob/main/numba/cpython/numbers.py#L206 but it isn't.

There's a wider question of why CUDA even needs to redefine the typing / lowering of things that already have implementations, which also needs looking at, possibly at the same time as this - having unnecessary redefinitions increases the chances of issues like this occurring.

gmarkall added needtriage CUDA CUDA related issue/PR labels Jul 4, 2022

gmarkall added bug - numerically incorrect Bugs: numerical behaviour is incorrect and removed needtriage labels Jul 4, 2022

gmarkall added this to the Numba 0.57 RC milestone Jul 4, 2022

gmarkall added this to Imported in CUDA 0.57 overflow May 2, 2023

gmarkall removed this from the Numba 0.57 RC milestone May 2, 2023

testhound mentioned this issue May 16, 2023

Fix math.pow with int64 argument for CUDA #8963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Numerical differences between CPython / CPU target and the CUDA target in `math.pow` #8218

CUDA: Numerical differences between CPython / CPU target and the CUDA target in `math.pow` #8218

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

bdice commented Jul 5, 2022 •

edited

gmarkall commented Jul 5, 2022

CUDA: Numerical differences between CPython / CPU target and the CUDA target in math.pow #8218

CUDA: Numerical differences between CPython / CPU target and the CUDA target in math.pow #8218

Comments

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

gmarkall commented Jul 4, 2022

bdice commented Jul 5, 2022 • edited

gmarkall commented Jul 5, 2022

CUDA: Numerical differences between CPython / CPU target and the CUDA target in `math.pow` #8218

CUDA: Numerical differences between CPython / CPU target and the CUDA target in `math.pow` #8218

bdice commented Jul 5, 2022 •

edited