Add GPU schedule for Blur (targeted GTX 970) #1690

darkbuck · 2016-12-14T19:44:57Z

Add GPU schedules based on different strategy for blur.

abadams · 2016-12-14T23:43:13Z

I'm a fan of improvements to the blur schedule, but I disagree with the new gpu_tile overloads. I think they do too much in one call. I'd prefer people just say .tile(x, y, xi, yi, 2, 2).gpu_tile(x, y, 8, 8) to get that kind of nested tiling. I also think that you typically want the thread axis to be the innermost (apart from the vectorization) to get dense loads. So for a 16-bit type the sort of schedule I'd use is something like:

.vectorize(x, 2)
.tile(x, y, xi, yi, 32, 2)
.tile(x, y, xo, yo, 2, 8)
.gpu(x, y, xi, yo)

That says (or at least is intended to say) that each thread does a 2x2 block of work, the work done by one thread is adjacent in y, but separated by 32x2 elements (one full cache line) in x. E.g. thread tx, ty in block 0, 0 takes care of four pairs of elements: ([2tx, 2tx+1], 2ty) ([2tx+64, 2tx+65], 2ty) ([2tx, 2tx+1], 2ty+1) ([2tx+64, 2tx+65], 2ty+1) I think that makes all the loads and stores dense across the warp.

darkbuck · 2016-12-15T05:44:37Z

I see your pointer to schedule innermost outwards while most CPU schedule is made outermost inwards. That will make the current GPU interface more straight-forward. That's fine as that's just short-cut and won't matter too much as long as people figure out the difference between optimal GPU and CPU schedules.

I rewrote the schedule following your point. They achieve the same code generation and performance. Shall we merge the GPU schedule example for blur?

abadams · 2016-12-15T21:44:47Z

apps/blur/halide_blur_generator.cpp

+                //   blur_x calculation is re-used implicitly. This achieves
+                //   the similar schedule of sliding window.
+                Var yi("yi");
+                blur_y.split(y, y, yi, tile_y);


Our usual convention for chaining scheduling calls across multiple lines is to not repeat the Func name:

blur_y.split(...) .reorder(...) .unroll(...) .gpu_tile(...);

abadams · 2016-12-15T21:45:11Z

Yeah, looks good. Just one style nit.

darkbuck · 2016-12-16T08:32:33Z

Sure, style is revised.

darkbuck mentioned this pull request Dec 14, 2016

Are OpenCL/GPU schedule for blur of apps #1568

Open

abadams reviewed Dec 15, 2016

View reviewed changes

Blur: Add GPU schedule tuned on GTX970

dcde95f

darkbuck changed the title ~~Add GPU nested tiling support~~ Add GPU schedule for Blur (targeted GTX 970) Dec 17, 2016

abadams merged commit 16e771d into halide:master Dec 18, 2016

darkbuck deleted the darkbuck/master/gpu-blur branch January 25, 2017 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU schedule for Blur (targeted GTX 970) #1690

Add GPU schedule for Blur (targeted GTX 970) #1690

darkbuck commented Dec 14, 2016

abadams commented Dec 14, 2016

darkbuck commented Dec 15, 2016

abadams Dec 15, 2016

abadams commented Dec 15, 2016

darkbuck commented Dec 16, 2016

Add GPU schedule for Blur (targeted GTX 970) #1690

Add GPU schedule for Blur (targeted GTX 970) #1690

Conversation

darkbuck commented Dec 14, 2016

abadams commented Dec 14, 2016

darkbuck commented Dec 15, 2016

abadams Dec 15, 2016

Choose a reason for hiding this comment

abadams commented Dec 15, 2016

darkbuck commented Dec 16, 2016