Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
How to use single GPU thread to handle a tile #3114
For mobile GPU, if I want to set the tile size as 8x8 and use single thread to handle this tile, how can I do with Halide? Or how can set the work-group size as one but with multiple GPU threads? Such setting is very useful for mobile GPU.
I know with original OpenCL, it is very easy. But with Halide, gpu_tile seems cannot meet my requirement, because it is using multiple GPU threads to handle that tile and each tile is a work group.
I'm not 100% sure what you're asking, because I'm used to cuda terminology, but you should be able to get any combination of sizes using Func::tile calls and direct calls to Func::gpu_blocks and Func::gpu_threads to mark which dimensions are which things. E.g. the following gives you 8x8 thread blocks with one thread per block:
f.tile(x, y, xi, yi, 8, 8).gpu_blocks(x, y);
So that one thread will do a little serial 8x8 loop inside its lonely thread block.
One thread block with an 8x8 group of threads iterating over the image serially would be something like:
f.tile(x, y, xi, yi, 8, 8).gpu_threads(xi, yi).gpu_blocks(Var::outermost);
Var::outermost is a synthetic variable which is a dummy outermost loop of size 1. It can be useful for marking device transitions that don't actually have loops associated with them. You could equivalently do something like:
f.tile(x, y, xi, yi, width, height).tile(xi, yi, xii, yii, 8, 8).gpu_blocks(x, y).gpu_threads(xii, yii);
where "width" and "height" are Exprs that equal the output size.
I'd suggest turning on HL_DEBUG_CODEGEN=1 and inspecting the pseudocode generated by things like the above.
Thanks, Andrew. Very detailed help.
With ARM Mali GPU, below one works for me and the result is what I am looking for:
The another solution "f.tile(x, y, xi, yi, 8, 8).gpu_threads(xi, yi).gpu_blocks(Var::outermost);" throws out compiling error message " error: no matching function for call to ‘Halide::Func::gpu_blocks(Halide::Var (&)())"