Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to generate C for host and OpenCL for device? #2797

Closed
ronghongbo opened this issue Mar 7, 2018 · 3 comments
Closed

how to generate C for host and OpenCL for device? #2797

ronghongbo opened this issue Mar 7, 2018 · 3 comments

Comments

@ronghongbo
Copy link

Hello,

I would like to generate OpenCL code for a device (from a parallel loop in a loop nest), and generate C code for the host (from the rest of the loop nest). By using "compile_jit", I can see the OpenCL code generated, and the host code is automatically executed. How can I dump out the host code as C so that I may play with the C and OpenCL code manually (e.g. modify and compile them in a command shell)?

thanks!
hongbo

@zvookin
Copy link
Member

zvookin commented Mar 7, 2018 via email

@ronghongbo
Copy link
Author

Thanks! I tried that, it generates C code, but the host and device code are not separated. The code is like this:
halide_copy_to_device(...)
#pragma omp parallel for
for (....) {
// the code that should be a device kernel
}

halide_device_free(...)
I hope the above bold lines can be separated out as a device kernel, and the activation of the kernel is explicitly shown.

@zvookin
Copy link
Member

zvookin commented Mar 7, 2018 via email

dsharletg added a commit that referenced this issue Feb 26, 2021
dsharletg added a commit that referenced this issue Mar 4, 2021
…on't build them if not enabled (#5776)

* Remove unused vertex buffer parameters.

* Offload GPU code in a lowering pass instead of via CodeGen_GPU_Host. Fixes #5650, fixes #2797, fixes #2084, now #1971 is more relevant.

* clang-format.

* clang-format sorting is case sensitive!?

* clang-tidy

* Move codegen backends into anonymous namespaces in source files.

* clang-format

* Pass type arguments correctly.

* Update OffloadGPULoops.cpp

* trigger buildbots

* trigger buildbots

* Hack around tests that rely on the IR for offloaded GPU loops.

* Fix missing include.

* Remove unused include.

* clang-tidy

* Use custom lowering pass to see code before GPU offloading

* Speculative fix for segfault

* Fix const correctness

* Fix error on unused variables in generated code.

Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants