-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve documentation in reference to CUDA local memory #7329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the PR! (Also, many thanks for all the other contributions that you're making on Discourse and the issue tracker - it's great to have an examination of all these details that have become somewhat unattended-to over the years)
In my view the local memory is allocated - it's a static allocation done at compile time, that happens to be implemented using the same device memory as global memory is. Therefore, I feel that avoiding the use of "allocate" in the documentation makes the explanation of the concept less clear than it was before, so I'd prefer to keep the original wording.
I do like the addition of the link to the explanation of local memory in the CUDA Programming guide though, and I think that is a useful addition. As it's a "See also"-type remark, I think it would be good to use the seealso
RST directive to make it stand out (see the suggestion on the diff for an example of how this could look).
So, to move forward, could you please:
- Undo the changes to the wording, or, if you feel that the original wording is inaccurate / unhelpful, explain a little more about your thinking behind the changes, or suggest another wording?
- Adjust the link to the CUDA Programming guide to use the
seealso
directive?
Many thanks so far and in advance!
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
I think the changes are removed now, and I accepted the see also edit. It's been very useful to see your suggestions to the PRs. Thank you! Let me know if there's anything else on this one I need to do. |
Many thanks for the update, and your contribution to Numba! This looks good to me now. |
See https://stackoverflow.com/questions/68869806/how-is-performance-affected-by-using-numba-cuda-local-array-compared-with-numb. As I am new to CUDA computing, I don't know if my suggested edits are correct and am merely passing on the implication in the linked SE post. In the CUDA docs under local memory, I didn't see any use of the word "allocate", so perhaps this is a correct suggestion. If the original usage is fine, please reject this pull request.
Reference an existing issue
Fixes #7328.