Make launch config construction more convenient with a builder class #311

eyalroz · 2022-04-03T22:38:20Z

I'm not satisfied with how we can construct launch configurations right now.

make_launch_config is a bit of a lame name.
It's easy to get the grid and the block dimensions mixed up.
You can separate the specification of different aspects of the launch config.
No support for the overall dimensions + block dimensions idiom from OpenCL (see Support making grids from OpenCL-style "global+local work sizes" #214).
Specifically, you can create a linear launch config given a block size without doing your own division-rounding up. That's embarrassing in the example programs!

I'm giving serious though to adding a launch config builder class, to solve all of the above.

The text was updated successfully, but these errors were encountered:

codecircuit · 2022-04-05T19:07:34Z

make_launch_config is a bit of a lame name.

I think the function name is pretty good. It does exactly what it says and keeps naming conventions from the STL like make_tuple or make_pair. Why do you think it is a lame name?

It's easy to get the grid and the block dimensions mixed up.

Didn't happen to me, but what would be a solution for this? Different types for grid and block dimensions, such that they cannot be cast into each other implicitly?

Specifically, you can create a linear launch config given a block size without doing your own division-rounding up. That's embarrassing in the example programs!

This is a good point. Very often some kind of ceil_divide functionality is currently required when the grid dimension is defined.

--> Especially because of the last point this abstraction seems to be a good idea. I am looking forward to your design.

eyalroz · 2022-04-05T22:18:51Z

like make_tuple or make_pair.

Those were introduced because we didn't have template deduction guides... we can now write std::pair { foo, bar }. Although, to be pair - this is a C++11 library, not C++17.

but what would be a solution for this?

auto lc = cuda::launch_config::builder(my_kernel).block_dims(my_dims).grid_dims(my_other_dims);

I am looking forward to your design.

Maybe

auto lc = cuda::launch_config::builder(my_kernel).overall_dims(yet_other_dims).block_dims(my_other_dims);

which would work both for dims3 and for integral types, and

auto lc = cuda::launch_config::builder(my_kernel).overall_size(sz1).block_size(sz2);

or

auto lc = cuda::launch_config::builder(my_kernel).overall_size(sz1).block_size(sz2).dimensionality(1);

or

auto lc = cuda::launch_config::builder(my_kernel).overall_size(sz1).block_size(sz2).linear();

eyalroz · 2022-04-05T22:21:26Z

Also, a builder would be a place to apply all those occupancy functions, e.g. if we're linear, and have specified the overall size, then telling the builder "use the max-occupancy block size" rather than obtaining it with a separate function call and applying it to the launch config.

codecircuit · 2022-04-10T08:13:19Z

Why do we need .linear() or .dimensionality(1)? Shouldn't the dimensionality be derived from the dimensionality of sz1 and sz2?

Integrating the max-occupancy or max-active-blocks calculation into the launch config seems to be a convenient design. I guess my_kernel is of type kernel_t, which also saves the corresponding device for the kernel launch. In that case, we can also integrate a check if the kernel launch config exceeds the device limits when the kernel would be launched. Currently, an exception is thrown after the kernel was launched, which does not state explicitly the device limits and the required limits (e.g., registers, shared memory, static shared memory). I think that are two points:

Should the launch config builder throw an exception if the kernel launch config is invalid for the associated device of the kernel?
Wether the launch config builder or the kernel launch itself throws an error if the device limits are exceeded, it would be helpful to increase the verbosity of the exception such that the device limits and the required limits shown.

eyalroz · 2022-04-10T10:14:07Z

Why do we need .linear() or .dimensionality(1)? Shouldn't the dimensionality be derived from the dimensionality of sz1 and sz2?

Well, we might not, I'm not sure. You see, we could theoretically constrain the total size without constraining the distribution of this size among the dimensions.

Should the launch config builder throw an exception if the kernel launch config is invalid for the associated device of the kernel?

Well, I'd say yes, but that would only work for kernels which are associated with a device to begin with (i.e. not apriori-compiled ones). Actually, the more important question is when to throw - immediately, or when the configuration is finalized?

Wether the launch config builder or the kernel launch itself throws an error if the device limits are exceeded, it would be helpful to increase the verbosity of the exception such that the device limits and the required limits shown.

I'd say the builder could through an exception. If the builder can know about the kernel it might as well.

…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier... * Easy to build linear launch configurations. * Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself. * Checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder. Remains to be implemented: * More/stricter validity checks. * Integration of optimal block size / launch grid functions from the API with this builder.

eyalroz · 2022-04-11T20:27:45Z

@codecircuit : Have a look at the effects of the new launch config build in the vectorAdd.cu example.

eyalroz · 2022-04-11T20:28:48Z

I'm still mulling over whether to run all those checks though. Perhaps I should only run them when building in debug mode?

…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier... * Easy to build linear launch configurations. * Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself. * Checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder. Remains to be implemented: * More/stricter validity checks. * Integration of optimal block size / launch grid functions from the API with this builder.

…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier... * Easy to build linear launch configurations. * Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself. * When compiling in Debug mode (i.e. with `NDEBUG` undefined), checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder. Remains to be implemented: * Integration of optimal block size / launch grid functions from the API with this builder.

…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.

…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier... * Easy to build linear launch configurations. * Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself. * When compiling in Debug mode (i.e. with `NDEBUG` undefined), checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder. Remains to be implemented: * Integration of optimal block size / launch grid functions from the API with this builder.

…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.

…fying no dynamic shared memory is used

eyalroz added enhancement task labels Apr 3, 2022

eyalroz self-assigned this Apr 3, 2022

eyalroz changed the title ~~Make launch config construction more convenient~~ Make launch config construction more convenient with a builder class Apr 11, 2022

eyalroz added the resolved-on-development label Apr 11, 2022

eyalroz mentioned this issue Apr 16, 2022

Support making grids from OpenCL-style "global+local work sizes" #214

Closed

eyalroz added a commit that referenced this issue Apr 25, 2022

Regards #311: Added the ability to have the launch configuration buil…

bb5d51f

…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.

eyalroz added a commit that referenced this issue Apr 25, 2022

Regards #311: Added the ability to have the launch configuration buil…

d18eb10

…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.

eyalroz added a commit that referenced this issue May 9, 2022

Regards #311: Added the ability to have the launch configuration buil…

6982b63

…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.

eyalroz closed this as completed in a3bdbad May 9, 2022

eyalroz added a commit that referenced this issue Jun 20, 2022

Regards #311: Added the ability to have the launch configuration buil…

1783ae0

…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.

eyalroz added a commit that referenced this issue Mar 17, 2023

Regards #311 : Adding a method to the launch config builder for speci…

2bfb1c2

…fying no dynamic shared memory is used

eyalroz added a commit that referenced this issue Mar 18, 2023

Regards #311 : Adding a method to the launch config builder for speci…

f736fcc

…fying no dynamic shared memory is used

eyalroz mentioned this issue Jan 29, 2024

Use the launch config builder in more examples #578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make launch config construction more convenient with a builder class #311

Make launch config construction more convenient with a builder class #311

eyalroz commented Apr 3, 2022

codecircuit commented Apr 5, 2022 •

edited

Loading

eyalroz commented Apr 5, 2022 •

edited

Loading

eyalroz commented Apr 5, 2022

codecircuit commented Apr 10, 2022

eyalroz commented Apr 10, 2022

eyalroz commented Apr 11, 2022

eyalroz commented Apr 11, 2022

Make launch config construction more convenient with a builder class #311

Make launch config construction more convenient with a builder class #311

Comments

eyalroz commented Apr 3, 2022

codecircuit commented Apr 5, 2022 • edited Loading

eyalroz commented Apr 5, 2022 • edited Loading

eyalroz commented Apr 5, 2022

codecircuit commented Apr 10, 2022

eyalroz commented Apr 10, 2022

eyalroz commented Apr 11, 2022

eyalroz commented Apr 11, 2022

codecircuit commented Apr 5, 2022 •

edited

Loading

eyalroz commented Apr 5, 2022 •

edited

Loading