Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make launch config construction more convenient with a builder class #311

Closed
eyalroz opened this issue Apr 3, 2022 · 7 comments
Closed

Comments

@eyalroz
Copy link
Owner

eyalroz commented Apr 3, 2022

I'm not satisfied with how we can construct launch configurations right now.

  • make_launch_config is a bit of a lame name.
  • It's easy to get the grid and the block dimensions mixed up.
  • You can separate the specification of different aspects of the launch config.
  • No support for the overall dimensions + block dimensions idiom from OpenCL (see Support making grids from OpenCL-style "global+local work sizes" #214).
  • Specifically, you can create a linear launch config given a block size without doing your own division-rounding up. That's embarrassing in the example programs!

I'm giving serious though to adding a launch config builder class, to solve all of the above.

@codecircuit
Copy link
Contributor

codecircuit commented Apr 5, 2022

  • make_launch_config is a bit of a lame name.

I think the function name is pretty good. It does exactly what it says and keeps naming conventions from the STL like make_tuple or make_pair. Why do you think it is a lame name?

  • It's easy to get the grid and the block dimensions mixed up.

Didn't happen to me, but what would be a solution for this? Different types for grid and block dimensions, such that they cannot be cast into each other implicitly?

  • Specifically, you can create a linear launch config given a block size without doing your own division-rounding up. That's embarrassing in the example programs!

This is a good point. Very often some kind of ceil_divide functionality is currently required when the grid dimension is defined.

--> Especially because of the last point this abstraction seems to be a good idea. I am looking forward to your design.

@eyalroz
Copy link
Owner Author

eyalroz commented Apr 5, 2022

like make_tuple or make_pair.

Those were introduced because we didn't have template deduction guides... we can now write std::pair { foo, bar }. Although, to be pair - this is a C++11 library, not C++17.

but what would be a solution for this?

auto lc = cuda::launch_config::builder(my_kernel).block_dims(my_dims).grid_dims(my_other_dims);

I am looking forward to your design.

Maybe

auto lc = cuda::launch_config::builder(my_kernel).overall_dims(yet_other_dims).block_dims(my_other_dims);

which would work both for dims3 and for integral types, and

auto lc = cuda::launch_config::builder(my_kernel).overall_size(sz1).block_size(sz2);

or

auto lc = cuda::launch_config::builder(my_kernel).overall_size(sz1).block_size(sz2).dimensionality(1);

or

auto lc = cuda::launch_config::builder(my_kernel).overall_size(sz1).block_size(sz2).linear();

@eyalroz
Copy link
Owner Author

eyalroz commented Apr 5, 2022

Also, a builder would be a place to apply all those occupancy functions, e.g. if we're linear, and have specified the overall size, then telling the builder "use the max-occupancy block size" rather than obtaining it with a separate function call and applying it to the launch config.

@codecircuit
Copy link
Contributor

Why do we need .linear() or .dimensionality(1)? Shouldn't the dimensionality be derived from the dimensionality of sz1 and sz2?

Integrating the max-occupancy or max-active-blocks calculation into the launch config seems to be a convenient design. I guess my_kernel is of type kernel_t, which also saves the corresponding device for the kernel launch. In that case, we can also integrate a check if the kernel launch config exceeds the device limits when the kernel would be launched. Currently, an exception is thrown after the kernel was launched, which does not state explicitly the device limits and the required limits (e.g., registers, shared memory, static shared memory). I think that are two points:

  • Should the launch config builder throw an exception if the kernel launch config is invalid for the associated device of the kernel?
  • Wether the launch config builder or the kernel launch itself throws an error if the device limits are exceeded, it would be helpful to increase the verbosity of the exception such that the device limits and the required limits shown.

@eyalroz
Copy link
Owner Author

eyalroz commented Apr 10, 2022

Why do we need .linear() or .dimensionality(1)? Shouldn't the dimensionality be derived from the dimensionality of sz1 and sz2?

Well, we might not, I'm not sure. You see, we could theoretically constrain the total size without constraining the distribution of this size among the dimensions.

Should the launch config builder throw an exception if the kernel launch config is invalid for the associated device of the kernel?

Well, I'd say yes, but that would only work for kernels which are associated with a device to begin with (i.e. not apriori-compiled ones). Actually, the more important question is when to throw - immediately, or when the configuration is finalized?

Wether the launch config builder or the kernel launch itself throws an error if the device limits are exceeded, it would be helpful to increase the verbosity of the exception such that the device limits and the required limits shown.

I'd say the builder could through an exception. If the builder can know about the kernel it might as well.

eyalroz added a commit that referenced this issue Apr 11, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* Checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* More/stricter validity checks.
* Integration of optimal block size / launch grid functions from the API with this builder.
eyalroz added a commit that referenced this issue Apr 11, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* Checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* More/stricter validity checks.
* Integration of optimal block size / launch grid functions from the API with this builder.
@eyalroz
Copy link
Owner Author

eyalroz commented Apr 11, 2022

@codecircuit : Have a look at the effects of the new launch config build in the vectorAdd.cu example.

@eyalroz
Copy link
Owner Author

eyalroz commented Apr 11, 2022

I'm still mulling over whether to run all those checks though. Perhaps I should only run them when building in debug mode?

@eyalroz eyalroz changed the title Make launch config construction more convenient Make launch config construction more convenient with a builder class Apr 11, 2022
eyalroz added a commit that referenced this issue Apr 11, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* Checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* More/stricter validity checks.
* Integration of optimal block size / launch grid functions from the API with this builder.
eyalroz added a commit that referenced this issue Apr 11, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* When compiling in Debug mode (i.e. with `NDEBUG` undefined), checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* Integration of optimal block size / launch grid functions from the API with this builder.
eyalroz added a commit that referenced this issue Apr 13, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* When compiling in Debug mode (i.e. with `NDEBUG` undefined), checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* Integration of optimal block size / launch grid functions from the API with this builder.
eyalroz added a commit that referenced this issue Apr 16, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* When compiling in Debug mode (i.e. with `NDEBUG` undefined), checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* Integration of optimal block size / launch grid functions from the API with this builder.
eyalroz added a commit that referenced this issue Apr 25, 2022
…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.
eyalroz added a commit that referenced this issue Apr 25, 2022
…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.
eyalroz added a commit that referenced this issue May 9, 2022
…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.
@eyalroz eyalroz closed this as completed in a3bdbad May 9, 2022
eyalroz added a commit that referenced this issue Jun 20, 2022
…_config_builder_t` (which you can create using `cuda::launch_config_builder()`). It make building lauch configurations easier...

* Easy to build linear launch configurations.
* Can specify the overall dimensions and the block or grid dims instead of always having to compute block and grid dims yourself.
* When compiling in Debug mode (i.e. with `NDEBUG` undefined), checks compatibility of dimensions and shared memory size with the kernel or device associated with the builder.

Remains to be implemented:

* Integration of optimal block size / launch grid functions from the API with this builder.
eyalroz added a commit that referenced this issue Jun 20, 2022
…der generate a linear grid, given the block dimensions, kernel and device, which will saturate the device with active blocks.
eyalroz added a commit that referenced this issue Mar 17, 2023
eyalroz added a commit that referenced this issue Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants