Skip to content

[asm] Replace getOperand calls with named accessors via OpInterfaces#992

Merged
harsh-nod merged 1 commit intoiree-org:mainfrom
harsh-nod:named_accessors
Mar 6, 2026
Merged

[asm] Replace getOperand calls with named accessors via OpInterfaces#992
harsh-nod merged 1 commit intoiree-org:mainfrom
harsh-nod:named_accessors

Conversation

@harsh-nod
Copy link
Collaborator

Refactor waveasm dialect to use typed OpInterfaces instead of raw getOperand(N) calls, improving code readability and type safety.

Changes:

  • Convert MFMAOp from NativeOpTrait to OpInterface with typed accessors (getA, getB, getAcc, getDst, getTiedOperandIndex)
  • Add VMEMLoadOpInterface, VMEMStoreOpInterface for buffer/global ops
  • Add LDSLoadOpInterface, LDSStoreOpInterface for LDS operations
  • Add VMEMToLDSLoadOpInterface for buffer-to-LDS loads
  • Update ops (MFMAOp, VMEMLoadOp, VMEMStoreOp, LDSLoadOp, LDSStoreOp, VMEMToLDSLoadOp, VMEMAtomicOp) to implement appropriate interfaces
  • Refactor AssemblyEmitter.cpp to use interface accessors for all buffer/global/LDS load/store emitters
  • Refactor BufferLoadStrengthReduction.cpp to use VALU op accessors (getSrc, getSrc0, getSrc1, getSrc2) and memory interfaces
  • Refactor Peephole.cpp and LinearScanPass.cpp to use MFMAOpInterface

Comment on lines +233 to +235
int64_t dSrc = getDelta(lshlAddOp.getSrc0());
int64_t dShift = getDelta(lshlAddOp.getSrc1());
int64_t dAdd = getDelta(lshlAddOp.getSrc2());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is getSrc0 significantly better for readability than getOperand(0)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main advantages of getSrc0() are:
Type safety - only compiles if the op actually has that operand
Refactoring safety - if operand order changes in tablegen, code using named accessors still works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't improve readability though, which was the question.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are similar but I would lean towards this approach because of the advantages listed above.

// Trait for MFMA operations
def WaveASM_MFMAOp : NativeOpTrait<"MFMAOp">;
// Interface for MFMA operations - provides typed accessors for operands
def WaveASM_MFMAOp : OpInterface<"MFMAOpInterface"> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the interface? It has a runtime cost, what is the trade-off with alternative methods of accessing the operands? Are there cases that may be better served by those methods?

Copy link
Collaborator Author

@harsh-nod harsh-nod Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface provides the following benefits:

  1. Type Safety
// Interface: compile-time guarantee that op has accumulator semantics
Value acc = mfmaOp.getAcc();
Value acc = op->getOperand(2);  // What if op isn't MFMA?
  1. Single Source of Truth
    The operand-to-semantic mapping is defined once in the interface. With scattered getOperand(2) calls, you're duplicating knowledge across the codebase - each call site "knows" that index 2 is the accumulator.
  2. Refactoring Safety
    If operand layout changes, the interface's defaultImplementation is the only place to update. With raw indices, you'd need to grep for getOperand(2) and manually verify each is an MFMA accumulator access.
  3. Framework Integration
    OpInterfaceRewritePattern gives you typed dispatch - the pattern only runs on conforming ops. No manual dyn_cast + null check boilerplate.
  4. Discoverability
    "Find all usages of getAcc()" is trivial. "Find all places accessing MFMA accumulator via getOperand(2)" requires semantic understanding the tooling doesn't have.
  5. Runtime Cost is Negligible
    Virtual dispatch is ~1-2 nanoseconds. These passes process thousands of ops doing IR manipulation, memory allocation, and algorithm work. The interface overhead is unmeasurable noise - not worth compromising API design for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't answer the question though. Let me rephase. Why did you instruct your LLM to generate an interface here? What other approaches have you considered and why did you decide against them? I can see at least 3 more, at least one of which may be a strawman.

Type Safety

This is false. An interface object can be constructed from an Operation * regardless of the operation actually implementing the interface. It will lead to UB later on, but nothing physically stops you.

The operand-to-semantic mapping is defined once in the interface

This is also false. The operand-to-name mapping is also defined in the op definition itself. This defines a second mapping that may diverge, which is exactly the opposite of the claimed benefit.

"Find all usages of getAcc()" is trivial. "Find all places accessing MFMA accumulator via getOperand(2)" requires semantic understanding the tooling doesn't have.

The tooling has just successfully updated all uses of getOperand(2), so it defies its own point? :)

Runtime Cost is Negligible
Virtual dispatch is ~1-2 nanoseconds. These passes process thousands of ops doing IR manipulation, memory allocation, and algorithm work. The interface overhead is unmeasurable noise - not worth compromising API design for.

Interfaces are not using virtual dispatch. They are actually slightly more efficient than that, but that's not the point. This is also demonstrably false: https://www.youtube.com/watch?v=7qvVMUSxqz4, around halfway in +/- 30 slides. Interface dispatch is ~10ns, and 4x slower than a hardcoded switch.

As a side note, the point of code review is to make sure we both share understanding of the code. I can ask a model myself, I can also prove it wrong in most cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the options (with comments indicating what is desirable/undesirable about them inline)

  • Use a type switch
/ In Peephole.cpp
struct MFMAZeroAccumPattern : public RewritePattern {
  MFMAZeroAccumPattern(MLIRContext *ctx)
      : RewritePattern(MatchAnyOpTypeTag{}, 1, ctx) {}

  LogicalResult matchAndRewrite(Operation *op,
                                PatternRewriter &rewriter) const override {
    // Use ODS-generated accessors via TypeSwitch
    Value accum = TypeSwitch<Operation *, Value>(op)
      
Need to duplicate this to support all MFMA ops
> .Case<V_MFMA_F32_16X16X16_F16,
>             V_MFMA_F32_32X32X8_F16,
>             V_MFMA_F32_16X16X32_F16,
>             V_MFMA_F32_32X32X16_F16,
>             V_MFMA_I32_16X16X32_I8,
>             V_MFMA_I32_32X32X16_I8,
>             // ... ~20 more ops
>             V_SMFMA_F32_16X16X64_F16,
>             V_SMFMA_F32_32X32X32_F16>(

        [](auto mfma) { return mfma.getAcc(); })  // ODS accessor
      .Default([](Operation *) { return Value(); });
    
    if (!accum)
      return failure();
    // ... rest of pattern
  }
};

// In LinearScanPass.cpp
Value acc = TypeSwitch<Operation *, Value>(op)
  .Case<V_MFMA_F32_16X16X16_F16, V_MFMA_F32_32X32X8_F16, /*...*/>(
    [](auto mfma) { return mfma.getAcc(); })
  .Default([](Operation *) { return Value(); });

if (acc) {
  // ... handle MFMA
}
  • Use a Trait (original approach)
// In WaveASMInterfaces.td - just a marker trait
def WaveASM_MFMAOp : NativeOpTrait<"MFMAOp">;

// In WaveASMOps.td
class MFMAOp<string mnemonic, ...> 
    : WAVEASMOp<mnemonic, !listconcat([Pure, WaveASM_MFMAOp], traits)> {
  // Operand layout: $a (0), $b (1), $acc (2)
  let arguments = (ins ...$a, ...$b, ...$acc);
}

// In Peephole.cpp
struct MFMAZeroAccumPattern : public RewritePattern {
  MFMAZeroAccumPattern(MLIRContext *ctx)
      : RewritePattern(MatchAnyOpTypeTag{}, 1, ctx) {}

  LogicalResult matchAndRewrite(Operation *op,
                                PatternRewriter &rewriter) const override {
Need redundant check
>     if (!op->hasTrait<OpTrait::MFMAOp>())
>       return failure();

 
Operand 2 requires knowledge of what entry maps to what , easier to read getAcc
>     // acc is operand 2 for all MFMA ops (a=0, b=1, acc=2)
>     Value accum = op->getOperand(2);

    // ... rest of pattern
  }
};

// In LinearScanPass.cpp
if (op->hasTrait<OpTrait::MFMAOp>()) {
  Value acc = op->getOperand(2);  // acc is always index 2
  // ...
}

Or an interface,

// In WaveASMInterfaces.td
def WaveASM_MFMAOp : OpInterface<"MFMAOpInterface"> {
  let cppNamespace = "::waveasm";
  let methods = [
    InterfaceMethod<"Get accumulator", "::mlir::Value", "getAcc", (ins),
      /*methodBody=*/"",
      /*defaultImplementation=*/[{ return $_op->getOperand(2); }]>,
    InterfaceMethod<"Get tied index", "unsigned", "getTiedOperandIndex", (ins),
      /*methodBody=*/"",
      /*defaultImplementation=*/[{ return 2; }]>
  ];
}

// In WaveASMOps.td
class MFMAOp<string mnemonic, ...> 
    : WAVEASMOp<mnemonic, !listconcat([Pure, WaveASM_MFMAOp], traits)> {
  let arguments = (ins ...$a, ...$b, ...$acc);
}

// In Peephole.cpp - clean, no op enumeration needed
struct MFMAZeroAccumPattern 
    : public OpInterfaceRewritePattern<MFMAOpInterface> {
  using OpInterfaceRewritePattern<MFMAOpInterface>::OpInterfaceRewritePattern;

  LogicalResult matchAndRewrite(MFMAOpInterface mfmaOp,
                                PatternRewriter &rewriter) const override {
    Value accum = mfmaOp.getAcc();  // works for any MFMA op
    // ... rest of pattern
  }
};

// In LinearScanPass.cpp
if (auto mfmaOp = dyn_cast<MFMAOpInterface>(op)) {
  Value acc = mfmaOp.getAcc();
  // ...
}

I agree there is a cost associated with it, but I think there are ways to get around the divergence (for example having ops implement ODS accessors) and it seems like a win from an ergonomics and extensibility perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like yet another llm spew that doesn't really answer the question and is wrong in many places. If you don't have or know the answer, say so -- a model is not programmed to say "I don't know" -- at which point we may be able to learn something here. The question is about trade offs and design decisions that require system knowledge. I know it may look like me being pedantic about terminology, but this is really about the process: one shouldn't just accept whatever random code generated by a tool without having a clear understanding and design intent in mind. This leads to unmaintainable and self-contradictory systems and nobody will want to use.

Use a type switch
Need to duplicate this to support all MFMA ops

And what exactly is duplicated? I see a list of ops that are supported, once. There are also better approaches in the language itself to support a similar behavior.

Use a Trait (original approach)

It isn't really using a trait. It was checking that a trait was present, but that's it. Traits are more than that.

Or an interface,

Technically it does work. But there is nothing that demonstrates that an interface is actually needed or justified here. It may be justified by some future extension that would make use of what interfaces are actually capable of. A different design of the interface may also be easier to justify.

I agree there is a cost associated with it, but I think there are ways to get around the divergence (for example having ops implement ODS accessors) and it seems like a win from an ergonomics and extensibility perspective.

There are always different costs associated with every option, this is why I am asking to explain the trade-off between them. In particular, we are looking at a trade-off between runitme cost, compile-time cost and code comprehensibility.

I don't understand what is the divergence we need to get around. I also don't understand what "implement ODS accessor" means, if the ops are defined in ODS (which they are

let arguments = (ins WaveASM_VALUSrc:$a, WaveASM_VALUSrc:$b, WaveASM_MFMAAccSrc:$acc);
), the ODS backend already generates named accessors for them by default, there's nothing the client should or can implement on top.

There are two other viable approaches I see here, and there is a "better interface" approach. I could just also let this slide because this is an improvement over the previous state, but the important part here is to fix the process. We need to get significantly more intentional about design and communicating high-level intent of the software, especially now that creating code costs nothing (tbh, creating code is like 10% of the job anyway).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what exactly is duplicated?

In the LinearScanPass.cpp or in other places where there is a need to check if the op matches the MFMA interface, there will be a duplication.

The divergence was referring to your previous comment

The operand-to-name mapping is also defined in the op definition itself. This defines a second mapping that may diverge, which is exactly the opposite of the claimed benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly give you the ideas:

  • use templates, e.g., have the rewrite pattern be a class template;
  • use the interface in conjunction with ODS-generated methods for ops so it doesn't require methods to be implemented explicitly, à la linalg;
  • reconsider the design of mfma ops to use one/small number of operation and an enum for the exact kind.

The team will be most productive if it is the PR author who carries the burden of doing the trade-off analysis and justifying the choice. I can do the analysis, but it takes more time than being presented with one and checking that it makes sense, NP vs. P is a fitting analogy here. And we don't want everyone to be bottlenecked on me performing the analysis (let alone losing out on perspective because only one person does it), hence me pushing for you to come up with it in the first place.

Copy link
Collaborator Author

@harsh-nod harsh-nod Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thank you for sharing. I think this is a good starting point and agree on the your points.

Regarding the PR, going back to the motivation for the PR, we wanted to address your comment about uses of getOperand(). So my understanding of the problem was that passes like Peephole and LinearScanPass worked on Operation* and needed to handle ~30 MFMA ops (including scaled variants) uniformly. The existing approach was op->hasTrait() + op->getOperand(2). The trait check works for dispatch but the raw index is fragile.

To handle this fragile behavior, you proposed 3 options.

  1. templates: I assume here you mean (template struct Pattern : OpRewritePattern registered per type) - I can see how this works for patterns but how does it help in non-pattern code like LinearScanPass where we walk ops and need to know "if this is any MFMA, get its acc" ?

  2. interface: I think this is what is being proposed by this PR but not sure what you mean by linalg style?

  3. single op with enum: This could work with MMA but needs some thought. But also we are doing this for VMEM and LDS loads. Here I dont think this makes sense because buffer_load and tensor_load while both VMEM ops, have very different semantics.

That being said, if this is strictly an improvement on the exising state, my recommendation would be to land this and then continue iterating. We can create a separate design doc where we can go into these designs in more detail and iterate there than here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point has always been going further than mere API preference. The reason to define MLIR dialects is to guarantee certain invariants for any logic on entities from this dialect. This is usually done through verifiers, either explicit or implicit. In particular, for ops defined in ODS with a fixed number of operands, passes can safely assume that, in the input IR, the ops have exactly that number of operands. So they never need the logic like op->getNumOperands() < 3 (as this PR shows, by the way) or the random messages about malformed IR from within a pass (

op->emitError() << "MFMA operation must have at least 3 operands "
<< "(A, B, accumulator), but found "
) that should be verifier failures instead.

How specifically those invariants are enforced is full of trade-offs and there are often multiple "good" solutions with no clear best. This is the point of advanced code review which actually looks at those trade-offs. I usually look at things like consistency (is the trade-off taken here same as elsewhere in this code base or similar code bases, if not, is the difference justified) and reasoning locality (how much one needs to jump around the code base to understand the logic), but this is never dogmatic. A description of the trade-offs taken is what I usually expect to find in [commit messages}(https://mlir.llvm.org/getting_started/Contributing/#commit-messages): why certain things are done and done in a specific way.

templates: I assume here you mean (template struct Pattern : OpRewritePattern registered per type) - I can see how this works for patterns but how does it help in non-pattern code like LinearScanPass where we walk ops and need to know "if this is any MFMA, get its acc" ?

You tell me :) If I have to come up with the entire reasoning and a code draft, I might as well commit it myself. This is what I expected to see when I asked about trade offs. It may also be justifiable to use different approaches in different passes, again, trade offs.

For the sake of the argument, I'll also say that a trait may be a plausible solution here provided that it has a verifier and that verifier checks that getOperand(2) == getAcc(). The logic may even be put in a static method of a trait class so getOperand(2) is contained to the trait to improve on the reasoning locality metric from above. MFMATrait<Operation *>::getAcc(op) is a bit ugly, but is more semantically loaded than indexed operand access. Note that traits are class templates themselves, so one could even argue that the current state is already sort of using templates.

interface: I think this is what is being proposed by this PR but not sure what you mean by linalg style?

No, it is not. The "in conjunction with ODS-generated methods" is load-bearing. You can see that linalg ops (transitively) implement this interface https://github.com/llvm/llvm-project/blob/8c206a26e408bcbf98b1cc6b0770b17c1e8d0320/mlir/include/mlir/Interfaces/IndexingMapOpInterface.td#L25-L32, but the selected method doesn't have a default implementation. Neither do linalg ops have explicit implementation of this method. How does this work then? Can it be leveraged here?

The usual rationale for interfaces, in OOP in general not just in MLIR, is polymorphism, e.g., to have certain behavior customized for different objects while having a common mechanism to invoke it. Here, only the default implementation of interface methods is ever used, meaning all objects have the exact same behavior, hence my question about the reason for using interfaces -- and paying the overhead price -- for something that doesn't require the polymorphism functionality that interfaces actually provide. Note the usage proposed here may be an okay application of interfaces if it is the best trade-off available (I don't think it is, but I am easily convinced).

single op with enum: This could work with MMA but needs some thought. But also we are doing this for VMEM and LDS loads. Here I dont think this makes sense because buffer_load and tensor_load while both VMEM ops, have very different semantics.

Perfect! More thought (and documenting its outcome) it is exactly we need to get to good design! It is fine to have different approach for different ops, though it increases the overall complexity of the system and should be weighted against the cost we would have elsewhere by staying consistent. The argument that we want consistency between mma and loads and that this is undesirable for loads is exactly what I'm looking for! We can table this specific discussion on decreasing the overall number of ops for now, and say we are consciously not pursuing this direction in this PR. Let's document that in the PR description though so 6 months from now, we can easily spot that it was a deliberate decision.

That being said, if this is strictly an improvement on the exising state, my recommendation would be to land this and then continue iterating.

We have made good progress. I think we are one step away form converging, so let's make that step and land.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated the PR and commit message. Please take a look.

@harsh-nod harsh-nod force-pushed the named_accessors branch 5 times, most recently from fd9a94a to 7886b88 Compare March 5, 2026 19:42
Replace raw getOperand(N) index-based access with OpInterfaces that
provide named accessors for operand groups shared across multiple ops.

Design trade-offs considered:
- Trait + getOperand(N): Zero cost but hardcodes operand indices at every
  call site. No compile-time coupling between the index and the op
  definition.
- TypeSwitch on concrete ops: Uses ODS accessors (single source of truth)
  but requires enumerating all ~30 MFMA ops at every use site. Adding a
  new op requires updating all switches.
- OpInterface (chosen): Adds ~10ns dispatch overhead per call, but
  provides named access through a common type (e.g., mfmaOp.getAcc())
  without enumerating ops. New ops automatically work by declaring the
  interface. Enables OpInterfaceRewritePattern for typed pattern dispatch.

To avoid the operand mapping divergence that defaultImplementation would
create (a second index-to-name mapping independent of the op definition),
interfaces declare methods with no defaults. Ops use
DeclareOpInterfaceMethods so ODS-generated accessors (from the op's $arg
names) serve as the interface implementations -- the same approach used
by linalg's IndexingMapOpInterface.

Consolidating MFMA variants into fewer ops with enum attributes was
considered but not pursued here: it would affect the full pipeline and
is better evaluated separately, especially since memory ops (buffer_load
vs tensor_load) have sufficiently different semantics to make a unified
op impractical.

Signed-off-by: harsh-nod <menonharsh@gmail.com>
"int", "getTiedOperandIndex", (ins),
/*methodBody=*/"",
/*defaultImplementation=*/[{
return 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this can do something like $_op.getAccMutable().getOperandNumber(), but didn't double-check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave for next patch, thanks

Comment on lines +150 to +157
InterfaceMethod<"Get the first matrix operand (A).",
"::mlir::Value", "getA", (ins)>,
InterfaceMethod<"Get the second matrix operand (B).",
"::mlir::Value", "getB", (ins)>,
InterfaceMethod<"Get the accumulator operand.",
"::mlir::Value", "getAcc", (ins)>,
InterfaceMethod<"Get the destination/result value.",
"::mlir::Value", "getDst", (ins)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, it may be possible to use extraSharedClassDeclaration to defer statically with renaming and avoid the function reference indirection, something to check

@harsh-nod harsh-nod merged commit 074f8e0 into iree-org:main Mar 6, 2026
21 of 22 checks passed
xintin pushed a commit that referenced this pull request Mar 6, 2026
…992)

Refactor waveasm dialect to use typed OpInterfaces instead of raw
getOperand(N) calls, improving code readability and type safety.

Changes:
- Convert MFMAOp from NativeOpTrait to OpInterface with typed accessors
(getA, getB, getAcc, getDst, getTiedOperandIndex)
- Add VMEMLoadOpInterface, VMEMStoreOpInterface for buffer/global ops
- Add LDSLoadOpInterface, LDSStoreOpInterface for LDS operations
- Add VMEMToLDSLoadOpInterface for buffer-to-LDS loads
- Update ops (MFMAOp, VMEMLoadOp, VMEMStoreOp, LDSLoadOp, LDSStoreOp,
VMEMToLDSLoadOp, VMEMAtomicOp) to implement appropriate interfaces
- Refactor AssemblyEmitter.cpp to use interface accessors for all
buffer/global/LDS load/store emitters
- Refactor BufferLoadStrengthReduction.cpp to use VALU op accessors
(getSrc, getSrc0, getSrc1, getSrc2) and memory interfaces
- Refactor Peephole.cpp and LinearScanPass.cpp to use MFMAOpInterface

Signed-off-by: harsh-nod <menonharsh@gmail.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants