Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-empty compiler output #756

Closed
stavoltafunzia opened this issue May 15, 2024 · 9 comments
Closed

Non-empty compiler output #756

stavoltafunzia opened this issue May 15, 2024 · 9 comments
Labels

Comments

@stavoltafunzia
Copy link

I always get a non-empty compiler output for any kernel I compile. The complete warning message says:

CompilerWarning: Built kernel retrieved from cache. Original from-source build had warnings:
Build on <pyopencl.Device 'NVIDIA GeForce RTX 4070 SUPER' on 'NVIDIA CUDA' at 0x41e0140> succeeded, but said:

(): Warning: Function simple_mult is a kernel, so overriding noinline attribute. The function may be inlined when called.

To Reproduce
Sample code to reproduce:

import pyopencl as cl

src = r"""
void __kernel simple_mult(__global const int *A, __global int *B) 
{
    B[get_global_id(0)] = A[get_global_id(0)] * 3;
}
"""

device = cl.get_platforms()[0].get_devices()[0]
cl_ctx =  cl.Context(devices=[device])
queue = cl.CommandQueue(cl_ctx)
prg = cl.Program(cl_ctx, src).build()

Expected behavior
The compiler output should be empty. When I use my OpenCL from C/C++ code, clGetProgramBuildInfo returns empty messages.

Environment (please complete the following information):

  • OS: Debian Bookworm
  • ICD Loader and version: libnvidia-opencl from Cuda 12.4
  • ICD and version: libnvidia-opencl from Cuda 12.4
  • CPU/GPU: Nvidia RTX 4000 series
  • Python version: 3.11
  • PyOpenCL version: 2024.2.2
@inducer
Copy link
Owner

inducer commented May 15, 2024

I'm a bit puzzled why this behavior should be different between PyOpenCL and a C++ program calling OpenCL directly. One possible reason that these messages got cached from an old version of the driver. You can check for this by deleting PyOpenCL's build cache:

# Careful! Double check this command before running it, to ensure it does what you intend.
rm -Rf $HOME/.cache/pyopencl  

and then rerunning.

@matthiasdiener
Copy link
Contributor

FWIW, I was not able to reproduce this with CUDA 12.2 on Debian unstable building for a TITAN X.

@stavoltafunzia
Copy link
Author

stavoltafunzia commented May 16, 2024

I'm a bit puzzled why this behavior should be different between PyOpenCL and a C++ program calling OpenCL directly. One possible reason that these messages got cached from an old version of the driver. You can check for this by deleting PyOpenCL's build cache:

# Careful! Double check this command before running it, to ensure it does what you intend.
rm -Rf $HOME/.cache/pyopencl  

and then rerunning.

Thanks, tried it, but unfortunately didn't work for me.

I also verified that compiled C code and pyopencl are indeed using the same opencl library. With strace I see that both programs open the following library:

openat(AT_FDCWD, "glibc-hwcaps/x86-64-v3/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v2/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "x86_64/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=84758, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 84758, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f35713dd000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1", O_RDONLY|O_CLOEXEC) = 3

I really have no idea why this message is originated.

@stavoltafunzia
Copy link
Author

stavoltafunzia commented May 16, 2024

Finally I found it. After inserting print statements to the pyopencl C++ code, I noticed that the following line was added to the kernel source:

__constant int pyopencl_defeat_cache_14d61c4d6ee748c9a9cef2d50121f8ef = 0;

If I remove such line (modifying pyopencl C++ code) I don't get anymore the compiler warning. Adding such line to my C++ opencl kernel, makes me getting the same build log. So that's the reason in the end, and pyopencl is consistent with C/C++ opencl interface.

Update: this is not the real cause.

@inducer
Copy link
Owner

inducer commented May 16, 2024

Interesting! Thanks for tracking this down, I had forgotten about that. :) I still kind of don't understand why having this triggers the warning it does; the warning seems entirely unrelated to that variable definition?

@stavoltafunzia
Copy link
Author

Yea, the compiler message seems totally unrelated to that variable, yet it’s anyway triggered by it. Don’t know what nvidia is doing here; we all know OpenCL is not bvidia top priority (to say an euphemism).
Btw, for curiosity, why that constant variable is added to the kernel source code? Looks like it’s related to pyopencl caching system. Is there an easy way to disable it?

@inducer
Copy link
Owner

inducer commented May 16, 2024

The reason the variable is there is to defeat broken vendor caches. I don't remember specifics, but in PyOpenCL's early days, I spent a long time tracking down what ended up being a bug in an ICD compiler cache. The ICD compiler did not notice that a header file included by the source was changed, and insisted on using a (stale) cached binary. That variable definition was there to help "convince" ICDs that they're looking at new source code every time, while PyOpenCL's own caching system is (hopefully) less broken than the ones built into the ICD. That said, for some specific ICDs that (competently) do their own caching, PyOpenCL's caching system imposes unnecessary overhead, which we're now thinking of (selectively) removing. See #738 for some discussion.

@stavoltafunzia
Copy link
Author

I now realized I gave a wrong explanation. I've been get confused by (I think) some build caching mechanism that the nvidia ICD compiler is apparently using. From C interface, the ICD compiler builds the kernel, and get non empty build info, only the first time I execute the program, while in all subsequent program executions clBuildProgram (I think that) uses some cache and clGetProgramBuildInfo returns an empty message only because (I think that) the build info are not cached.
In conclusion, the line below is not triggering the non-empty build log. I do get a non-empty build log even from the C interface, though only the first time I compile a kernel (I suppose due to the caching mechanism mentioned above).

__constant int pyopencl_defeat_cache_14d61c4d6ee748c9a9cef2d50121f8ef = 0;

In the end, nothing is due to pyopencl.

@inducer
Copy link
Owner

inducer commented May 17, 2024

Glad to hear everything got resolved. I'll go ahead and close this issue, LMK if anything else comes up.

@inducer inducer closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants