how to generate C for host and OpenCL for device? #2797

ronghongbo · 2018-03-07T01:04:42Z

Hello,

I would like to generate OpenCL code for a device (from a parallel loop in a loop nest), and generate C code for the host (from the rest of the loop nest). By using "compile_jit", I can see the OpenCL code generated, and the host code is automatically executed. How can I dump out the host code as C so that I may play with the C and OpenCL code manually (e.g. modify and compile them in a command shell)?

thanks!
hongbo

zvookin · 2018-03-07T08:23:08Z

You will need to use the "compile_to_c" method to generate C code. (I believe this works with GPU backends now, but it may not be very solid. This is a little used path.)

…

-Z-

On Tue, Mar 6, 2018 at 5:04 PM Hongbo Rong ***@***.***> wrote: Hello, I would like to generate OpenCL code for a device (from a parallel loop in a loop nest), and generate C code for the host (from the rest of the loop nest). By using "compile_jit", I can see the OpenCL code generated, and the host code is automatically executed. How can I dump out the host code as C so that I may play with the C and OpenCL code manually (e.g. modify and compile them in a command shell)? thanks! hongbo — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2797>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABbqFDRY-O7YmzzIe_jUZNibFqPRNFGTks5tbzItgaJpZM4Sfsf1> .

ronghongbo · 2018-03-07T17:19:39Z

Thanks! I tried that, it generates C code, but the host and device code are not separated. The code is like this:
halide_copy_to_device(...)
#pragma omp parallel for
for (....) {
// the code that should be a device kernel
}
halide_device_free(...)
I hope the above bold lines can be separated out as a device kernel, and the activation of the kernel is explicitly shown.

zvookin · 2018-03-07T18:27:38Z

Halide is not designed to generate separate kernels for GPUs that run independently of the Halide generated host code (in whatever system the host code is running).

…

-Z-

On Wed, Mar 7, 2018 at 9:19 AM, Hongbo Rong ***@***.***> wrote: Thanks! I tried that, it generates C code, but the host and device code are not separated. The code is like this: halide_copy_to_device(...) *#pragma omp parallel for for (....) { // the code that should be a device kernel }* halide_device_free(...) I hope the above bold lines can be separated out as a device kernel, and the activation of the kernel is explicitly shown. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2797 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABbqFGgDDlEf3rKpEHyvYFju1QeqEeutks5tcBaxgaJpZM4Sfsf1> .

…ixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant.

…on't build them if not enabled (#5776) * Remove unused vertex buffer parameters. * Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. Fixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant. * clang-format. * clang-format sorting is case sensitive!? * clang-tidy * Move codegen backends into anonymous namespaces in source files. * clang-format * Pass type arguments correctly. * Update OffloadGPULoops.cpp * trigger buildbots * trigger buildbots * Hack around tests that rely on the IR for offloaded GPU loops. * Fix missing include. * Remove unused include. * clang-tidy * Use custom lowering pass to see code before GPU offloading * Speculative fix for segfault * Fix const correctness * Fix error on unused variables in generated code. Co-authored-by: Steven Johnson <srj@google.com>

dsharletg added a commit that referenced this issue Feb 26, 2021

Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. F…

041b9fe

…ixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant.

dsharletg mentioned this issue Feb 27, 2021

Move CodeGen_GPU_Host to a lowering pass #5775

Merged

dsharletg closed this as completed in 7493c09 Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to generate C for host and OpenCL for device? #2797

how to generate C for host and OpenCL for device? #2797

ronghongbo commented Mar 7, 2018

zvookin commented Mar 7, 2018 via email

ronghongbo commented Mar 7, 2018

zvookin commented Mar 7, 2018 via email

how to generate C for host and OpenCL for device? #2797

how to generate C for host and OpenCL for device? #2797

Comments

ronghongbo commented Mar 7, 2018

zvookin commented Mar 7, 2018 via email

ronghongbo commented Mar 7, 2018

zvookin commented Mar 7, 2018 via email