-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix matrixVectorOp
to verify promoted pointer type is still aligned to vectorized load boundary
#325
Fix matrixVectorOp
to verify promoted pointer type is still aligned to vectorized load boundary
#325
Conversation
@viclafargue, RMM should be aligning allocations by default, which is why this has not been an issue for cuml algorithms. Before RMM, we did manually align allocations, but since RMM does this for us, we shouldn't have to explicitly check this in the algorithms (since the number of places this might need to be done could be exhaustive). Hopefully the solution proposed here solves the problem for the user. |
Thanks for finding a solution. That would solve the problem on the Python side. Maybe these should even be done undercover by default when cuML is imported? However, do we have the guarantee that pointers provided to the native API are always memory aligned? |
The assumption here is that users will be using RAPIDS and so we should be able to expect that the RAPIDS memory manager is being used to allocate any pointers that are handed to these prims. |
This PR is not necessary anymore. |
matrixVectorOp
matrixVectorOp
to verify pointer is aligned to L1 load boundary
I've reopened this pull request after chatting w/ @viclafargue and now that I have a better understanding of how this could be causing the issue in rapidsai/cuml#4199. Specifically, the |
@@ -93,17 +93,23 @@ void matrixVectorOp(Type *out, const Type *matrix, const Type *vec, IdxType D, | |||
IdxType N, bool rowMajor, bool bcastAlongRows, Lambda op, | |||
cudaStream_t stream) { | |||
IdxType stride = rowMajor ? D : N; | |||
size_t bytes = stride * sizeof(Type); | |||
if (16 / sizeof(Type) && bytes % 16 == 0) { | |||
size_t stride_bytes = stride * sizeof(Type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR looks good to merge for 21.10 but we should add a nice comment here which summarizes these low-level details and justifies the need for the additional check on pointer alignment. Before knowing myself that the primitive was promoting the type to optimize the reads, I was under the impression the alignment issue was happening from the inputs. It would also be helpful if we added some additional documentation to the TxN_t
struct as well. It currently states this: Obviously, it's caller's responsibility to take care of pointer alignment!
but it would be helpful to add a sentence or two justifying why (the description of the vectorized op itself isn't bad but it still doesn't make particularly clear why the alignment might sometimes be needed).
matrixVectorOp
to verify pointer is aligned to L1 load boundarymatrixVectorOp
to verify promoted pointer type is still aligned to vectorized load boundary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gpucibot merge |
Fix for rapidsai/cuml#3965
The function did not check for the memory alignment of the pointer provided.