You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This patch updates the mbarrier.arrive.* family of Ops
to include all features added up-to Blackwell.
* Update the `mbarrier.arrive` Op to include
shared_cluster memory space, cta/cluster scope and
an option to lower using relaxed semantics.
* An `arrive_drop` variant is added for both the `arrive`
and `arrive.nocomplete` operations.
* Verifier checks are added wherever appropriate.
* lit tests are added to verify the lowering to the intrinsics.
TODO:
* Updates for the remaining mbarrier family will be done in
subsequent PRs. (mainly, expect/complete-tx, arrive.expect-tx,
and {test/try}waits.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This operation causes the executing thread to signal its arrival at the barrier.
655
-
The operation returns an opaque value that captures the phase of the
656
-
*mbarrier object* prior to the arrive-on operation. The contents of this state
657
-
value are implementation-specific.
658
653
659
-
The operation takes the following operand:
654
+
- `res`: When the `space` is not shared_cluster, this operation returns an
655
+
opaque 64-bit value capturing the phase of the *mbarrier object* prior to
656
+
the arrive-on operation. The contents of this return value are
657
+
implementation-specific. An *mbarrier object* located in the shared_cluster
658
+
space cannot return a value.
659
+
660
+
The operation takes the following operands:
660
661
- `addr`: A pointer to the memory location of the *mbarrier object*. The `addr`
661
-
must be a pointer to generic or shared::cta memory. When it is generic, the
662
-
underlying address must be within the shared::cta memory space; otherwise
663
-
the behavior is undefined.
662
+
must be a pointer to generic or shared_cta or shared_cluster memory. When it
663
+
is generic, the underlying address must be within the shared_cta memory space;
664
+
otherwise the behavior is undefined.
665
+
- `count`: This specifies the amount by which the pending arrival count is
666
+
decremented. If the `count` argument is not specified, the pending arrival
667
+
count is decremented by 1.
668
+
- `scope`: This specifies the set of threads that directly observe the memory
669
+
synchronizing effect of the `mbarrier.arrive` operation.
670
+
- `space`: This indicates the memory space where the mbarrier object resides.
671
+
- `relaxed`: When set to true, the `arrive` operation has relaxed memory semantics
672
+
and does not provide any ordering or visibility guarantees.
664
673
665
674
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
666
675
}];
667
-
let assemblyFormat = "$addr attr-dict `:` type($addr) `->` type($res)";
The `nvvm.mbarrier.arrive_drop` operation decrements the expected arrival
710
+
count of the *mbarrier object* by `count` and then performs an arrive-on
711
+
operation. When `count` is not specified, it defaults to 1. The decrement
712
+
of the expected arrival count applies to all the subsequent phases of the
713
+
*mbarrier object*. The remaining semantics are identical to those of the
714
+
`nvvm.mbarrier.arrive` operation.
715
+
716
+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive-drop)
let summary = "MBarrier Arrive-Drop No-Complete Operation";
799
+
let description = [{
800
+
The `nvvm.mbarrier.arrive_drop.nocomplete` operation decrements the expected
801
+
arrival count of the *mbarrier object* by the amount `count` and then performs
802
+
an arrive-on operation on the *mbarrier object* with the guarantee that it
803
+
will not cause the barrier to complete its current phase.
804
+
805
+
[For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive-drop)
0 commit comments