You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch updates the mbarrier.arrive.* family of Ops
to include all features added up-to Blackwell.
* Update the `mbarrier.arrive` Op to include
shared_cluster memory space, cta/cluster scope and
an option to lower using relaxed semantics.
* An `arrive_drop` variant is added for both the `arrive`
and `arrive.nocomplete` operations.
* Verifier checks are added wherever appropriate.
* lit tests are added to verify the lowering to the intrinsics.
TODO:
* Updates for the remaining mbarrier family will be done in
subsequent PRs. (mainly, expect/complete-tx, arrive.expect-tx,
and {test/try}waits.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This operation causes the executing thread to signal its arrival at the barrier.
733
-
The operation returns an opaque value that captures the phase of the
734
-
*mbarrier object* prior to the arrive-on operation. The contents of this state
735
-
value are implementation-specific.
736
731
737
-
The operation takes the following operand:
732
+
- `res`: When the `space` is not shared_cluster, this operation returns an
733
+
opaque 64-bit value capturing the phase of the *mbarrier object* prior to
734
+
the arrive-on operation. The contents of this return value are
735
+
implementation-specific. An *mbarrier object* located in the shared_cluster
736
+
space cannot return a value.
737
+
738
+
The operation takes the following operands:
738
739
- `addr`: A pointer to the memory location of the *mbarrier object*. The `addr`
739
-
must be a pointer to generic or shared::cta memory. When it is generic, the
740
-
underlying address must be within the shared::cta memory space; otherwise
741
-
the behavior is undefined.
740
+
must be a pointer to generic or shared_cta or shared_cluster memory. When it
741
+
is generic, the underlying address must be within the shared_cta memory space;
742
+
otherwise the behavior is undefined.
743
+
- `count`: This specifies the amount by which the pending arrival count is
744
+
decremented. If the `count` argument is not specified, the pending arrival
745
+
count is decremented by 1.
746
+
- `scope`: This specifies the set of threads that directly observe the memory
747
+
synchronizing effect of the `mbarrier.arrive` operation.
748
+
- `space`: This indicates the memory space where the mbarrier object resides.
749
+
- `relaxed`: When set to true, the `arrive` operation has relaxed memory semantics
750
+
and does not provide any ordering or visibility guarantees.
742
751
743
752
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
744
753
}];
745
-
let assemblyFormat = "$addr attr-dict `:` type($addr) `->` type($res)";
The `nvvm.mbarrier.arrive_drop` operation decrements the expected arrival
788
+
count of the *mbarrier object* by `count` and then performs an arrive-on
789
+
operation. When `count` is not specified, it defaults to 1. The decrement
790
+
of the expected arrival count applies to all the subsequent phases of the
791
+
*mbarrier object*. The remaining semantics are identical to those of the
792
+
`nvvm.mbarrier.arrive` operation.
793
+
794
+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive-drop)
let summary = "MBarrier Arrive-Drop No-Complete Operation";
877
+
let description = [{
878
+
The `nvvm.mbarrier.arrive_drop.nocomplete` operation decrements the expected
879
+
arrival count of the *mbarrier object* by the amount `count` and then performs
880
+
an arrive-on operation on the *mbarrier object* with the guarantee that it
881
+
will not cause the barrier to complete its current phase.
882
+
883
+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive-drop)
0 commit comments