Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LangRef] Clarify semantics of masked vector load/store #82469

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RalfJung
Copy link
Contributor

This is based on what I think has to follow from the statement about preventing exceptions. But I don't actually know what LLVM IR passes will do with these intrinsics, so this requires careful review by someone who does. :)

@nikic do you know these passes / know who knows these passes to do the review?

Also, there's an open question that remains: for the purpose of noalias, do these operations access the masked-off lanes or not? I sure hope they don't, but I realized that while data races are mentioned, noalias is not.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 21, 2024

@llvm/pr-subscribers-llvm-ir

Author: Ralf Jung (RalfJung)

Changes

This is based on what I think has to follow from the statement about preventing exceptions. But I don't actually know what LLVM IR passes will do with these intrinsics, so this requires careful review by someone who does. :)

@nikic do you know these passes / know who knows these passes to do the review?

Also, there's an open question that remains: for the purpose of noalias, do these operations access the masked-off lanes or not? I sure hope they don't, but I realized that while data races are mentioned, noalias is not.


Full diff: https://github.com/llvm/llvm-project/pull/82469.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+2)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index fd2e3aacd0169c..496773c4d3d386 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -23752,6 +23752,7 @@ Semantics:
 
 The '``llvm.masked.load``' intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
 The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
+In particular, this means that only the masked-on lanes of the vector need to be inbounds of an allocation (but all these lanes need to be inbounds of the same allocation).
 
 
 ::
@@ -23794,6 +23795,7 @@ Semantics:
 
 The '``llvm.masked.store``' intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
 The result of this operation is equivalent to a load-modify-store sequence. However, using this intrinsic prevents exceptions and data races on memory access to masked-off lanes.
+In particular, this means that only the masked-on lanes of the vector need to be inbounds of an allocation (but all these lanes need to be inbounds of the same allocation).
 
 ::
 

@@ -23752,6 +23752,7 @@ Semantics:

The '``llvm.masked.load``' intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.
The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes.
In particular, this means that only the masked-on lanes of the vector need to be inbounds of an allocation (but all these lanes need to be inbounds of the same allocation).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "masked-on" the opposite of "masked-off"? Or is there some other term I could use?

@nikic nikic requested a review from topperc February 21, 2024 08:07
@nikic
Copy link
Contributor

nikic commented Feb 21, 2024

I would rephrase this in terms of something like this:

However, these intrinsics behave as-is the masked off lanes are not accessed.

Which should tell use everything necessary about their semantics. Then can continue to clarify that this means no exceptions / data races / etc.

@RalfJung
Copy link
Contributor Author

That doesn't quite say everything -- there's the question of whether this Rust PR should say offset (aka getelementptr inbounds) or offset_wrapping (aka getelementptr) when describing how the pointers to the individual elements being loaded are computed.

@programmerjake
Copy link
Contributor

That doesn't quite say everything -- there's the question of whether this Rust PR should say offset (aka getelementptr inbounds)

there is the additional caveat that LLVM is allowed to create a poison value without UB (which is what happens with getelementptr inbounds with out-of-bounds indexes), but Rust defines out-of-bounds offset to be immediate UB, instead of deferred to the load/store.

a major difference between the two choices is that doing a masked load on a pointer before the beginning of it's allocation is disallowed with inbounds, but allowed without inbounds as long as vector elements are masked-off until the offset is big enough to be in the allocation's bounds.

@RalfJung
Copy link
Contributor Author

RalfJung commented Feb 21, 2024

a major difference between the two choices is that doing a masked load on a pointer before the beginning of it's allocation is disallowed with inbounds, but allowed without inbounds as long as vector elements are masked-off until the offset is big enough to be in the allocation's bounds.

Yes that is indeed the key point: if the first half of the vector is masked-off, and that first half is actually out-of-bounds, then the pointer itself is conceptually out-of-bounds and "computing the pointer to the actually loaded element" would be a non-inbounds pointer computation. I expect this usecase to be allowed, which is why I added the following in this PR:

Only the masked-on lanes of the vector need to be inbounds of an allocation (but all these lanes need to be inbounds of the same allocation).

@RalfJung
Copy link
Contributor Author

RalfJung commented May 2, 2024

@nikic I have updated the wording to

The result of this operation is equivalent to a regular vector load instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask, except that the masked-off lanes are not accessed.

Followed by clarification regarding exceptions, noalias, and data races. Does that work for you?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but a second opinion wouldn't hurt.

llvm/docs/LangRef.rst Outdated Show resolved Hide resolved
llvm/docs/LangRef.rst Outdated Show resolved Hide resolved
@nikic nikic changed the title clarify semantics of masked vector load/store [LangRef] Clarify semantics of masked vector load/store May 3, 2024
@nikic nikic requested a review from preames May 3, 2024 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants