Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Lower 16xi8 and 8xi8 stores efficiently #73646

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

bondhugula
Copy link
Contributor

Lower 16xi8 vector stores in NVPTX ISel efficiently using
st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads
and 8xf16. Similarly, 8xi8 using st.v2.u32.

Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits. LGTM

llvm/test/CodeGen/NVPTX/vector-stores.ll Outdated Show resolved Hide resolved
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Outdated Show resolved Hide resolved
@bondhugula bondhugula force-pushed the uday/nvptx_v16i8_vector_store branch 2 times, most recently from 6ab9db7 to 298e563 Compare November 29, 2023 06:14
// i32 results instead of letting ReplaceLoadVector split it into smaller stores
// during legalization. This is done at dag-combine time, so that vector
// operations with i8 elements can be optimised away instead of being needlessly
// split during legalization, which involves storing to the stack and loading it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Legalizer assuming that stack loads/stores are cheap is indeed a rather bad misoptimization for NVPTX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this comment might be out of date, as it looks copied from PerformLOADCombine and that was written before stack optimizations were done

Lower 16xi8 vector stores in NVPTX ISel efficiently using
st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads
and 8xf16. Similarly, 8xi8 using st.v2.u32.
@bondhugula bondhugula merged commit 173fcf7 into llvm:main Dec 1, 2023
3 checks passed
@steven-johnson
Copy link

This seems to have injected failures into Halide codegen; we are now getting runtime errors of the form CUDA_ERROR_MISALIGNED_ADDRESS for cuMemcpyDtoH() where we didn't before. It appears we are now emitting an aligned store instruction where we previous emitted an unaligned one. Can we get a revert of this pending further investigation, please?

; CHECK-LABEL: .visible .func v8i8_store
define void @v8i8_store(ptr %a, <8 x i8> %v) {
; CHECK: st.v2.u32
store <8 x i8> %v, ptr %a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only correct if the pointer is aligned to a 4-byte-boundary (IIUC), but AFAIK nothing in the IR to this point promises that alignment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Using larger types for loads/stores must be aligned appropriately.

We do use allowsMemoryAccessForAlignment in other places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we should revert it if a fix-forward is not imminent (this is breaking all of Halide's Cuda tests).

Artem-B added a commit to Artem-B/llvm-project that referenced this pull request Dec 5, 2023
This reverts commit 173fcf7.

Needs to constrain the optimization to properly aligned loads/stores only.
llvm#73646 (comment)
Artem-B added a commit that referenced this pull request Dec 6, 2023
…4518)

This reverts commit 173fcf7.

We need to constrain the optimization to properly aligned loads/stores
only.
#73646 (comment)
Copy link
Contributor

@pasaulais pasaulais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the alignment issue is addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants