Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Numerical differences between CPython / CPU target and the CUDA target in math.pow #8218

Open
gmarkall opened this issue Jul 4, 2022 · 6 comments
Labels
bug - numerically incorrect Bugs: numerical behaviour is incorrect CUDA CUDA related issue/PR

Comments

@gmarkall
Copy link
Member

gmarkall commented Jul 4, 2022

Originally noticed on Discourse.

The following:

from numba import cuda, njit
import numpy as np
import math


def inner(x, y):
    return int(math.pow(x, y))


cpu_jitted = njit(inner)


def cuda_jitted(x, y):
    inner_device = cuda.jit(device=True)(inner)

    @cuda.jit
    def outer(r, x, y):
        r[()] = inner_device(x[()], y[()])

    r_arr = np.array(0, dtype=np.int64)
    x_arr = np.array(0, dtype=np.int64)
    y_arr = np.array(0, dtype=np.int64)
    x_arr[()] = x
    y_arr[()] = y
    outer[1, 1](r_arr, x_arr, y_arr)

    return r_arr[()]


x = 3
y = 1

print(inner(x, y))
print(cpu_jitted(x, y))
print(cuda_jitted(x, y))

prints

3
3
2

This can cause some surprise.

@gmarkall gmarkall added needtriage CUDA CUDA related issue/PR labels Jul 4, 2022
@gmarkall
Copy link
Member Author

gmarkall commented Jul 4, 2022

Similar occurs in CUDA C / C++:

// Compile with:
// nvcc -gencode arch=compute_75,code=sm_75 -std c++17 mini.cu -o mini
//
// produces:
//
// 3
// 2

#include <iostream>
#include <cstdint>

template<typename T>
__global__ void kernel(T* r, T* x, T* y)
{
  r[0] = std::pow(x[0], y[0]);
}

template <typename T>
T host(T x, T y)
{
  return std::pow(x, y);
}

__managed__ uint32_t r_32;
__managed__ uint32_t x_32;
__managed__ uint32_t y_32;

int main(int argc, char **argv)
{
  x_32 = 3;
  y_32 = 1;
  kernel<<<1, 1>>>(&r_32, &x_32, &y_32);
  cudaDeviceSynchronize();
  std::cout << host(x_32, y_32) << std::endl;
  std::cout << r_32 << std::endl;
}

@gmarkall
Copy link
Member Author

gmarkall commented Jul 4, 2022

The CUDA target doesn't end up using int_power_impl, but the CPU target does. This will likely need some adjustments in the typing / lowering for math.pow for CUDA, but because of the mix of implementations (those from the CPU target, impl_pow_int and impl_binary in mathimpl.py) means that it will need a bit of careful thinking about / testing.

@gmarkall
Copy link
Member Author

gmarkall commented Jul 4, 2022

I think I'm going to label this as an actual bug because I think it should be fixable to be less surprising, even though the same occurs in CUDA C/C++.

@gmarkall gmarkall added bug - numerically incorrect Bugs: numerical behaviour is incorrect and removed needtriage labels Jul 4, 2022
@gmarkall gmarkall added this to the Numba 0.57 RC milestone Jul 4, 2022
@gmarkall
Copy link
Member Author

gmarkall commented Jul 4, 2022

IIRC a similar issue affected cuDF: rapidsai/cudf#10178

@bdice
Copy link
Contributor

bdice commented Jul 5, 2022

If the result is expected to be integral, it is possible to solve this in the same way as CuPy with an exponentiation-by-squaring algorithm, as I noted in the cuDF issue you linked. I think that might be the best solution for Numba as well as cuDF. rapidsai/cudf#10178 (comment)

@gmarkall
Copy link
Member Author

gmarkall commented Jul 5, 2022

Thanks @bdice - this is indeed what should be happening - it should be picking up this implementation: https://github.com/numba/numba/blob/main/numba/cpython/numbers.py#L206 but it isn't.

There's a wider question of why CUDA even needs to redefine the typing / lowering of things that already have implementations, which also needs looking at, possibly at the same time as this - having unnecessary redefinitions increases the chances of issues like this occurring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug - numerically incorrect Bugs: numerical behaviour is incorrect CUDA CUDA related issue/PR
Projects
Development

No branches or pull requests

2 participants