Fix `matrixVectorOp` to verify promoted pointer type is still aligned to vectorized load boundary #325

viclafargue · 2021-09-09T13:48:42Z

Fix for rapidsai/cuml#3965
The function did not check for the memory alignment of the pointer provided.

cjnolet · 2021-09-09T17:00:31Z

@viclafargue, RMM should be aligning allocations by default, which is why this has not been an issue for cuml algorithms. Before RMM, we did manually align allocations, but since RMM does this for us, we shouldn't have to explicitly check this in the algorithms (since the number of places this might need to be done could be exhaustive).

Hopefully the solution proposed here solves the problem for the user.

viclafargue · 2021-09-10T13:33:50Z

Thanks for finding a solution. That would solve the problem on the Python side. Maybe these should even be done undercover by default when cuML is imported? However, do we have the guarantee that pointers provided to the native API are always memory aligned?

cjnolet · 2021-09-15T22:04:43Z

However, do we have the guarantee that pointers provided to the native API are always memory aligned?

The assumption here is that users will be using RAPIDS and so we should be able to expect that the RAPIDS memory manager is being used to allocate any pointers that are handed to these prims.

viclafargue · 2021-09-16T09:48:07Z

This PR is not necessary anymore.

cjnolet · 2021-09-23T21:09:57Z

I've reopened this pull request after chatting w/ @viclafargue and now that I have a better understanding of how this could be causing the issue in rapidsai/cuml#4199.

Specifically, the matrixVectorOp attempts to make the application of vector elements to the matrix more efficient by loading multiple vector elements into a single CUDA thread. This alone won't cause an issue, but the kernel is further optimizing the reads by loading the vector elements into the read-only L1 cache and, insodoing, might promote the datatype to minimize the number of reads (e.g. when 2x 8byte elements need to be read, it will use 1x 16 byte datatype so that it can perform a single read). It's this promotion of the datatype that assumes the pointer is aligned to the type being promoted. Thus, the pointer alignment needs to be explicitly checked when this promotion occurs. This is an internal assumption, specific to the matrixVectorOp.

cjnolet · 2021-09-23T21:09:48Z

cpp/include/raft/linalg/matrix_vector_op.cuh

@@ -93,17 +93,23 @@ void matrixVectorOp(Type *out, const Type *matrix, const Type *vec, IdxType D,
                    IdxType N, bool rowMajor, bool bcastAlongRows, Lambda op,
                    cudaStream_t stream) {
  IdxType stride = rowMajor ? D : N;
-  size_t bytes = stride * sizeof(Type);
-  if (16 / sizeof(Type) && bytes % 16 == 0) {
+  size_t stride_bytes = stride * sizeof(Type);


I think this PR looks good to merge for 21.10 but we should add a nice comment here which summarizes these low-level details and justifies the need for the additional check on pointer alignment. Before knowing myself that the primitive was promoting the type to optimize the reads, I was under the impression the alignment issue was happening from the inputs. It would also be helpful if we added some additional documentation to the TxN_t struct as well. It currently states this: Obviously, it's caller's responsibility to take care of pointer alignment! but it would be helpful to add a sentence or two justifying why (the description of the vectorized op itself isn't bad but it still doesn't make particularly clear why the alignment might sometimes be needed).

cjnolet

LGTM

cjnolet · 2021-09-24T10:55:21Z

@gpucibot merge

Fix matrixVectorOp

4666c76

viclafargue requested review from a team as code owners September 9, 2021 13:48

github-actions bot added the cpp label Sep 9, 2021

viclafargue added 3 - Ready for Review bug Something isn't working non-breaking Non-breaking change and removed cpp labels Sep 9, 2021

Clang formatting

11d0886

github-actions bot added the cpp label Sep 9, 2021

viclafargue changed the title ~~Fix matrixVectorOp~~ Fix matrixVectorOp Sep 9, 2021

viclafargue closed this Sep 16, 2021

Nanthini10 mentioned this pull request Sep 21, 2021

[BUG] Memory Alignment issue using cuDF views with cuML's Linear Reg rapidsai/cudf#9267

Closed

cjnolet changed the title ~~Fix matrixVectorOp~~ Fix matrixVectorOp to verify pointer is aligned to L1 load boundary Sep 23, 2021

cjnolet reopened this Sep 23, 2021

cjnolet requested changes Sep 23, 2021

View reviewed changes

cjnolet changed the title ~~Fix matrixVectorOp to verify pointer is aligned to L1 load boundary~~ Fix matrixVectorOp to verify promoted pointer type is still aligned to vectorized load boundary Sep 23, 2021

Completing documentation

3778e4a

cjnolet approved these changes Sep 24, 2021

View reviewed changes

rapids-bot bot merged commit 6d7f897 into rapidsai:branch-21.10 Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `matrixVectorOp` to verify promoted pointer type is still aligned to vectorized load boundary #325

Fix `matrixVectorOp` to verify promoted pointer type is still aligned to vectorized load boundary #325

viclafargue commented Sep 9, 2021

cjnolet commented Sep 9, 2021

viclafargue commented Sep 10, 2021

cjnolet commented Sep 15, 2021 •

edited

Loading

viclafargue commented Sep 16, 2021

cjnolet commented Sep 23, 2021

cjnolet Sep 23, 2021

cjnolet left a comment

cjnolet commented Sep 24, 2021

Fix matrixVectorOp to verify promoted pointer type is still aligned to vectorized load boundary #325

Fix matrixVectorOp to verify promoted pointer type is still aligned to vectorized load boundary #325

Conversation

viclafargue commented Sep 9, 2021

cjnolet commented Sep 9, 2021

viclafargue commented Sep 10, 2021

cjnolet commented Sep 15, 2021 • edited Loading

viclafargue commented Sep 16, 2021

cjnolet commented Sep 23, 2021

cjnolet Sep 23, 2021

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Sep 24, 2021

Fix `matrixVectorOp` to verify promoted pointer type is still aligned to vectorized load boundary #325

Fix `matrixVectorOp` to verify promoted pointer type is still aligned to vectorized load boundary #325

cjnolet commented Sep 15, 2021 •

edited

Loading