Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the distribution of the threads over the tiles #74

Closed
timespacce opened this issue Dec 20, 2019 · 1 comment
Closed

Comments

@timespacce
Copy link

timespacce commented Dec 20, 2019

Hello,

I have questions about the cuda core sgemm.

  1. Each thread block loads per iteration a 128x8 A-tile and a 8x128 B-tile from global into shared memory. By having 256 threads pro thread block each thread would compute a 8x8 matrix multiplication?

  2. How are the threads distributed over the A-tile * B-tile?

Greetings,
James.

@timespacce timespacce changed the title Questions about the thread distribution over the tiles Questions about the distribution of the threads over the tiles Dec 20, 2019
@timespacce
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant