Skip to content

Version 0.6.8: CUDA 12.x features, launch config builder improvements, etc.

Compare
Choose a tag to compare
@eyalroz eyalroz released this 26 Feb 22:45
· 67 commits to master since this release

Changes since v0.6.7:

Build process & build configuration changes

  • #583 Exported targets now make sure apps link against some system libraries CUDA depends on (to circumvent CMake bug 25665).
  • #567 The runtime-and-driver target now has driver-and-runtime as an alias
  • #590 Avoid compiled warnings about narrowing conversions and shadowing (when those warning flags are turned on)

Launch configuration & launch config builder changes

  • #564 Can now launch kernels with the full range of CUDA 12.x launch attributes, including remote memory sync domain, programmatic launch dependence and programmatic completion events (see descriptions in the CUDA Driver API documentation).
  • #484 Support for setting block cluster dimensions (as part of the support of CUDA 12.x launch attributes).
  • #577, #582 More extensive validation of launch configurations when building them with the launch config builder gadget.
  • #581 More robust comparison operators for dimension structures
  • #580 Launch config builder can now be told to the "use the maximum number of active blocks per multiprocessor".
  • #579 User can now set a target device on a launch configuration target device without setting a contextualized kernel, in a launch config builder
  • #578 Now using the launch config builder in more of the example programs
  • #569 Took care of unused validation function which was triggering a warning with newer compilers

CUDA libraries and in-library, non-associated kernel support

  • #565 Now supporting "CUDA libraries" - files or blocks of data in memory containing compiled kernels, which are not loaded immediately into modules within contexts; and contain device-and-context-independent compiled kernels. Both of these can now be represented and worked with.
  • #576 A module no longer holds the link options it was created with; those are not essential to its use, and at times are impossible to (re)create when obtaining a module from a library (which also doesn't hold its link options ).

Refactoring

  • #586 The poor man's span class now has its own file (detail/span.hpp)
  • #588 Some under-the-hood refactoring of host memory allocation functions
  • #589 Factored the cuda::memory::region_t class into its own file

Other changes

  • #593 Some work on the cuda::memory::copy_parameters_t structure
  • #592 Dropped the memory::managed::region_t and const_region_t and now just using memory::region_t and const_region_t everywhere
  • #591 Memory copy functions for spans and other work on memory copy functions
  • #587 Added a missing variant of memory-zero'ing
  • #585 You can now write cuda::memory::make_unique() - and it's assumed you mean device memory (you have to specify a device or device context though)
  • #575 Moved apriori_compiled_kernel_t into the kernel namespace, yielding kernel::apriori_compiled_it
  • #570, #573 Removed some redundant inclusions and definitions
  • #568 Fixed some breakage of kernel_t::set_attribute()
  • #566 Can now properly get and set properties on kernels (raw functions and handles)

Compatibility

  • #572 Fixed broken CUDA 9.x compatibility