Questions about the distribution of the threads over the tiles #74

timespacce · 2019-12-20T13:03:59Z

Hello,

I have questions about the cuda core sgemm.

Each thread block loads per iteration a 128x8 A-tile and a 8x128 B-tile from global into shared memory. By having 256 threads pro thread block each thread would compute a 8x8 matrix multiplication?
How are the threads distributed over the A-tile * B-tile?

Greetings,
James.

timespacce · 2019-12-20T21:17:16Z

https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/

timespacce changed the title ~~Questions about the thread distribution over the tiles~~ Questions about the distribution of the threads over the tiles Dec 20, 2019

timespacce closed this as completed Dec 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the distribution of the threads over the tiles #74

Questions about the distribution of the threads over the tiles #74

timespacce commented Dec 20, 2019 •

edited

Loading

timespacce commented Dec 20, 2019

Questions about the distribution of the threads over the tiles #74

Questions about the distribution of the threads over the tiles #74

Comments

timespacce commented Dec 20, 2019 • edited Loading

timespacce commented Dec 20, 2019

timespacce commented Dec 20, 2019 •

edited

Loading