You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With this code there are generated 5 kernels for a single gpu target. Seems each kernel lambda generates kernel with actual content and one extra empty kernel (and one additional extra empty kernel is generated regardless of how many kernels there are). It is even more when I use "named" kernels.
There is no guarantee that one SYCL kernel will translate to one actual kernel. E.g. the implementation might always decide to multiversion kernels based on some argument properties.
The empty kernels that you are seeing are dummy kernels that we need to generate host-side visible kernel names. This is necessary due the restrictions that clang has around its __builtin_get_device_side_mangled_name() builtin, which only works on __global__ functions. Since SYCL kernels are not __global__ during parsing and semantic analysis, we cannot use this builtin directly. So we have to generate dummy __global__ functions to which we can apply the builtin, and then borrow the generated name (hence __hipsycl_kernel_name_template).
These dummy kernels have no negative impact on the generated code.
With this code there are generated 5 kernels for a single gpu target. Seems each kernel lambda generates kernel with actual content and one extra empty kernel (and one additional extra empty kernel is generated regardless of how many kernels there are). It is even more when I use "named" kernels.
Only these two kernels which start with
_Z16
contain actual code.The text was updated successfully, but these errors were encountered: