Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL backend plans? #42

Closed
LibRaw opened this issue Oct 5, 2012 · 16 comments
Closed

OpenCL backend plans? #42

LibRaw opened this issue Oct 5, 2012 · 16 comments

Comments

@LibRaw
Copy link

LibRaw commented Oct 5, 2012

Is there any chance, that Halide will generate OpenCL kernels for use on GPUs? Sometimes in future....

I want to use Halide in my desktop computer graphics (photo processing) app, but many users have AMD cards, not NVidia.

@mikeseven
Copy link

Definitely needed!

@jrk
Copy link
Member

jrk commented Nov 8, 2012

This is in process.

@mikeseven
Copy link

Any progress on OpenCL support? I would be happy to beta test it.

@pvila89
Copy link

pvila89 commented Feb 22, 2013

+1

@okigan
Copy link

okigan commented Feb 22, 2013

+2

@oscarbg
Copy link

oscarbg commented Apr 3, 2013

+3

@mikeseven
Copy link

As discussed at GTC, it might be difficult to do any portable IR for OpenCL. So an ideal solution maybe to use clang to generate OpenCL kernels source.

@oscarbg
Copy link

oscarbg commented Apr 4, 2013

Wait there is OpenCL SPIR which is pretty similar to LLVMIR even clang can generate SPIR via:
clang -x cl -fno-builtin -target spir -c -emit-llvm
and seems coming soon at least in AMD OCL drivers for testing..

@mikeseven
Copy link

Yes but SPIR is not (yet) supported by any GPU vendor and as far as I know
none have plans for it in the next year.
On Apr 4, 2013 3:25 AM, "oscarbg" notifications@github.com wrote:

Wait there is OpenCL SPIR which is pretty similar to LLVMIR even clang can
generate SPIR via:
clang -x cl -fno-builtin -target spir -c -emit-llvm
and seems coming soon at least in AMD OCL drivers for testing..


Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-15889813
.

@mikeseven
Copy link

SPIR has been released recently.
Has Halide made progress on OpenCL backend, using SPIR or not?

@jrk
Copy link
Member

jrk commented Dec 11, 2013

@dsharlet-intel has been making steady progress on both the SPIR and OpenCL C-based backends. I believe they're starting to pass most/all of the tests, as of (the very recent) 2e3222b commit.

@jrk jrk closed this as completed Dec 11, 2013
@mikeseven
Copy link

I tried the basic apps on osx 10.9 and all seg fault with OpenCL.

-- Mike
On Dec 11, 2013 7:52 AM, "Jonathan Ragan-Kelley" notifications@github.com
wrote:

@dsharlet-intel https://github.com/dsharlet-intel has been making
steady progress on both the SPIR and OpenCL C-based backends. I believe
they're starting to pass most/all of the tests, as of (the very recent)
2e3222b 2e3222b0489d commit.


Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-30331805
.

@jrk
Copy link
Member

jrk commented Dec 11, 2013

Do you know which OpenCL device you’re targeting? One really annoying gotcha of Apple’s implementations is that their x86 OpenCL backend only supports 1D kernel launches.

Do the tests pass?

@mikeseven
Copy link

I'm targeting embedded gpu mainly but before porting there I test on
desktops, which is macbook pro retina osx 10.9 with nvidia GPU. There the
Cuda/ptx back end works perfectly.
I'll try the tests and try to pinpoint the issue.

-- Mike
On Dec 11, 2013 10:18 AM, "Jonathan Ragan-Kelley" notifications@github.com
wrote:

Do you know which OpenCL device you’re targeting? One really annoying
gotcha of Apple’s implementations is that their x86 OpenCL backend only
supports 1D kernel launches.

Do the tests pass?


Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-30346118
.

@dsharlet-intel
Copy link
Contributor

I appreciate any information you can share from running the tests, but you should know that I've only just started looking at the apps since the last commit cited by jrk. The apps use a little bit different mechanism to run the generated code, which I didn't realize until recently.

In addition to the issue jrk mentioned regarding Apple's x86 OpenCL implementation, they also have a little bit different expected behavior for creating the OpenCL context. I will need to get an Apple machine to make sure this still works for Apple in addition to Linux/Win.

@mikeseven
Copy link

doing: make run_tests with HL_TARGET=opencl

clang++ -O3 test/correctness/argmax.cpp -Iinclude -Lbin -lHalide -lpthread
-ldl -o bin/test_argmax
cd tmp ; DYLD_LIBRARY_PATH=../bin LD_LIBRARY_PATH=../bin ../bin/test_argmax
OpenCL device codegen init_module
Error: Failed to build program executable! err = -11
Build Log:

No kernels or only kernel prototypes found.

Error: err == CL_SUCCESS
make: *** [test_argmax] Error 1

Same error with test_internal:
cd tmp ; DYLD_LIBRARY_PATH=../bin LD_LIBRARY_PATH=../bin
../bin/test_internal
IRPrinter test passed
CodeGen_C test passed
Simplify test passed
Bounds test passed
Lowering test passed
OpenCL device codegen init_module
Error: Failed to build program executable! err = -11
Build Log:

No kernels or only kernel prototypes found.

Error: err == CL_SUCCESS
make: *** [test_internal] Error 1

The problem is the kernel being generated:
/OpenCL C/
float nan_f32() { return NAN; }
float neg_inf_f32() { return -INFINITY; }
float inf_f32() { return INFINITY; }
float sqrt_f32(float x) { return sqrt(x); }
float sin_f32(float x) { return sin(x); }
float cos_f32(float x) { return cos(x); }
float exp_f32(float x) { return exp(x); }
float log_f32(float x) { return log(x); }
float abs_f32(float x) { return x < 0.0f ? -x : x; }
float floor_f32(float x) { return floor(x); }
float ceil_f32(float x) { return ceil(x); }
float round_f32(float x) { return round(x); }
float pow_f32(float x, float y) { return pow(x, y); }
float asin_f32(float x) { return asin(x); }
float acos_f32(float x) { return acos(x); }
float tan_f32(float x) { return tan(x); }
float atan_f32(float x) { return atan(x); }
float atan2_f32(float y, float x) { return atan2(y, x); }
float sinh_f32(float x) { return sinh(x); }
float asinh_f32(float x) { return asinh(x); }
float cosh_f32(float x) { return cosh(x); }
float acosh_f32(float x) { return acosh(x); }
float tanh_f32(float x) { return tanh(x); }
float atanh_f32(float x) { return atanh(x); }

there is no __kernel!!!

Misc stuff:
on OSX don't use g++ for CXX default but clang++. Default g++ is way to old.

with HL_TARGET=host or cuda, all success.

-- Mike

On Wed, Dec 11, 2013 at 10:53 AM, dsharlet-intel
notifications@github.comwrote:

I appreciate any information you can share from running the tests, but you
should know that I've only just started looking at the apps since the last
commit cited by jrk. The apps use a little bit different mechanism to run
the generated code, which I didn't realize until recently.

In addition to the issue jrk mentioned regarding Apple's x86 OpenCL
implementation, they also have a little bit different expected behavior for
creating the OpenCL context. I will need to get an Apple machine to make
sure this still works for Apple in addition to Linux/Win.


Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-30350612
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants