-
Notifications
You must be signed in to change notification settings - Fork 936
decide CONVERTOR_CUDA flag based on first byte of buffer #8597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
You don't need this. Instead you should check the range between pBaseBuf + pData->true_lb and pBaseBuf + pData->true_ub. |
|
bot:ibm:gnu:retest |
|
Thanks, that seems much cleaner, I'll change it. About using the true_ub, currently the cuda check uses cuPointerGetAttributes() on one address and I don't think there's an API for checking a range. I wasn't planning on checking the UB, partly because I'm not sure the top is that much more important than the middle and nobody wants the cudacheck walking through the entire datatype. Plus I don't know what I'd even have the cudacheck do if it saw different results for different parts of the userbuf. So I'm leaning toward just using LB to change the current check, I don't know why I didn't do that initially instead of pElem.disp |
|
Don't use LB, it point to nothing important. true_lb is the displacement of the first used byte from the buffer, so if you want to test something simple this is what you need. For the memory range I was under the impression that CUDA devices are mapped by full GB, and that it is possible to use the MTRR registers to figure that out. Checking the first byte should be good enough for now. |
82b1aac to
250451e
Compare
|
I meant true_lb. My earlier "find the first byte" wouldn't necessarily equal true_lb, since it was "first" in the sense of data ordering, but in the context of picking an address for a cuda query I think true_lb makes as much sense as anything. So I repushed, and went up the chain to look at the callers. It seems like some callers have a convertor available and for them it makes sense to pass in a userbuf, but other callers don't have a convertor so I'm changing them to add the true_lb themselves so they pass in the actual final address they want to have cuda-checked. |
There's a cuda check in the converter that checks one address and caches a flag for CONVERTOR_CUDA. But it was just checking pUserBuf which isn't necessarily anywhere near the data, and doesn't even need to be a legal address for that matter. MPI datatype displacements could put the actual data anywhere. Since opal_cuda_check_one_buf() was already using a convertor, I left its buffer meaning as userbuf and inside it uses the convertor's datatype for the offset. Then for opal_cuda_check_bufs() since it's just passing in two addresses with no convertor/datatype information, it's expected to handle the offset itself and pass in the final address to check Signed-off-by: Mark Allen <markalle@us.ibm.com>
250451e to
d2d90c4
Compare
|
rebased I think this should be ready to go in now |
|
bot:aws:retest |
|
@markalle can you rebase? These files may have gotten moved around a bit. |
There's a cuda check in the converter that checks one address and caches a flag for CONVERTOR_CUDA. But it was just checking pUserBuf which isn't necessarily anywhere near the data, and doesn't even need to be a legal address for that matter. MPI datatype displacements could put the actual data anywhere.
I added a lookup of the first non-loop pElem in the datatype description to use its .disp to find the first byte of data in the user's buf.
The code is probably being overly cautious, when it could just be coded as
But I was reluctant to write code that would run away in that while loop and fail ungracefully if the description array was bad for some reason or let it select a wrong first displacement if degenerate loops were a possibility in the description.
I'm not sure I feel about only looking at the first byte, when I'd think an MPI datatype could span cuda and non cuda memory, displacements can be anything. But this is still an improvement over just looking at pUserBuf which can be anything