Kernel awareness of hardware thread index #75

BenjaminPelletier · 2020-03-27T16:57:31Z

Each Accelerator has MaxNumThreads which can run simultaneously. Can a kernel invocation determine which of these threads it is running on, perhaps by accessing an integer between 0 and MaxNumThreads-1?

One application of this is when a GPU algorithm requires indexable working memory. For instance, there are a wide class of algorithms that are typically implemented using recursion. Since GPU methods may not recurse, these algorithms can be rewritten as flat loops, but in that case they generally need a "stack" space array in working memory. Because ILGPU does not support the provision of even fixed-size arrays as local variables in kernel methods (support for this would be better than the feature this issue refers to), this array must be provided as an ArrayView into the kernel. But, without a way to determine which hardware thread an invocation is using, this provided ArrayView must be the full size of all elements to be processed even though only MaxNumThreads elements of the ArrayView need to be used at any given time. So, for instance, if there were 10e6 elements to be processed, the "stack" ArrayView would need to have 10e6 * M stack depth items even though they're just working memory and no more than MaxNumThreads (order of 1e3) * M items would ever be in use at a time. If a kernel invocation could determine which hardware thread it was being run on, it could index into a merely MaxNumThreads * M ArrayView stack array rather than a 10e6 * M stack array.

m4rs-mt · 2020-03-30T18:39:10Z

@BenjaminPelletier thanks you for your question. It sounds like you want to calculate the grid stride of your kernel (in other words, Group.DimX * Grid.DimX). This corresponds to the number of threads started in the current kernel execution environment. Adding additional support for accessing the general maximum number of threads on an accelerator is a lot of work and can easily lead to incorrect memory accesses (since the actual start size of a kernel is not known beforehand).

Please note that the next version will include 1D arrays in local memory 🔢

BenjaminPelletier · 2020-03-30T21:26:01Z

With 1D arrays in local memory, this question becomes obsolete for my use case so maybe it's not worth spending much time on.

But, in case it is worthwhile, if it's the case that 1) no two threads running simultaneously will ever have the same (Group.X, Grid.X) pair and 2) Group.DimX * Grid.DimX <= Accelerator.MaxNumThreads, then it seems like this question would be answered if I could see Group.X and Grid.X in the kernel. I'm not sure how to do that though. Currently, I'm using an Index3, I don't see where GroupedIndex* classes are that are mentioned in the documentation, and I don't see a way to retrieve a more advanced index from my Index3.

MoFtZ · 2020-03-30T22:23:44Z

hi @BenjaminPelletier, you are most likely using accelerator.LoadAutoGroupedStreamKernel to start your kernel.

ILGPU provides a simplified API to launch kernels without having to worry about grouping - it definitely helped me when I started GPGPU programming. If you look at all the CUDA tutorials for example, they all require you to specify the grouping as part of launching the kernel.

If you want to take control of the grouping yourself, there are other LoadXXXKernel methods to explore - see Kernel Loading in the documentation.

You should probably also check out the ILGPU samples, which helped me get my head around the various API calls.

m4rs-mt self-assigned this Mar 30, 2020

m4rs-mt added the question label Mar 30, 2020

MoFtZ mentioned this issue Apr 20, 2020

Incorrect calculations #56

Closed

m4rs-mt closed this as completed May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel awareness of hardware thread index #75

Kernel awareness of hardware thread index #75

BenjaminPelletier commented Mar 27, 2020

m4rs-mt commented Mar 30, 2020 •

edited

BenjaminPelletier commented Mar 30, 2020

MoFtZ commented Mar 30, 2020

Kernel awareness of hardware thread index #75

Kernel awareness of hardware thread index #75

Comments

BenjaminPelletier commented Mar 27, 2020

m4rs-mt commented Mar 30, 2020 • edited

BenjaminPelletier commented Mar 30, 2020

MoFtZ commented Mar 30, 2020

m4rs-mt commented Mar 30, 2020 •

edited