-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pipeline] Add per-register clock gating #5489
Conversation
lib/Dialect/Pipeline/PipelineOps.cpp
Outdated
} | ||
odsState.addOperands(sortedClockGates); | ||
odsState.addAttribute( | ||
"operand_segment_sizes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, what is operand_segment_sizes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
52ff231
to
8ca5e8e
Compare
Variadic<I1>:$clockGates, | ||
I64ArrayAttr:$clockGatesPerRegister); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets hairy fast. Are we absolutely sure there's no way to represent this with ops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do Variadic<Variadic<I1>>
which would be the obvious one. So something has to break down that variadic list of operands. I chose I64ArrayAttr
+ a bit more code due to ease of implementation - open for suggestions 🤷.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily suggesting this -- and certainly not for this PR -- but would it make sense to to break up this op into multiple ops? Like a "stall controller" which outputs per register stalls then clock gate ops could be inserted before this op... That's my only half-formed idea.
OpBuilder<(ins "Block*":$dest, "ValueRange":$registers, "ValueRange":$passthroughs)>, | ||
// Clock gate builder, which takes a mapping between the registers and | ||
// and their clock gate hierarchy. | ||
OpBuilder<(ins "Block*":$dest, "ValueRange":$registers, "ValueRange":$passthroughs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add a builder which associates the registers with the clock gates with a struct? (i.e. struct {Value reg; vector clockGates;}
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could... but the highlighted builder isn't even being used (as your pointed out later in the review👀) - will remove this builder instead of adding more unused things.
stageTerminator->getLoc(), dataType, regIn, clock, dataValid, | ||
regName, reset, /*resetValue*/ Value(), /*sym_name*/ StringAttr()); | ||
// Use the clock gate instead of input muxing. | ||
Value currClockGate = notStalledClockGate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blakep-msft does it make sense to also gate on valid
?
} | ||
dataReg = builder.create<seq::CompRegOp>( | ||
stageTerminator->getLoc(), dataType, regIn, currClockGate, regName, | ||
reset, /*resetValue*/ Value(), /*sym_name*/ StringAttr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was there previously, but (at least on FPGAs) it's generally preferable not to reset data registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - i'll remove it. Can always add an option for ASIC (resets) in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't see a reason to do it on ASICs, though I don't have any experience there.
auto dataRegNext = builder.create<comb::MuxOp>( | ||
stageTerminator->getLoc(), dataValid, regIn, dataRegBE); | ||
Value imuxSelect = stageValidAndNotStalled; | ||
if (auto clockGates = stageTerminator.getClockGatesForReg(regIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have the clock gate regs enabled, why wouldn't we just ignore the clock gates? They're just power optimizations. If someone is using them for correctness, they're mis-using them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point - i guess we'll never end up with a more optimal design if including the "clock gating" in the input muxing hierarchy.
clockGates)); | ||
} | ||
|
||
Value dataRegNext = builder.create<comb::MuxOp>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was pre-existing, but why not use compreg.ce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A matter of representation. Opted for an input mux rather than a clock enable signal (this is how I was schooled in "proper" instantiation of registers). Both styles appear to synthesize to the same thing. For brevity, in a compiler, i'd prefer the compreg.ce
style - will change.
Also changes to that the `clock-gate-regs` options actually implements clock gates instead of a `seq.comp_reg.ce` operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (`seq.compreg.ce`) and input muxing. The first and last are what we have now.
9331628
to
b1448c2
Compare
Also changes to that the `clock-gate-regs` options actually implements clock gates instead of a `seq.comp_reg.ce` operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (`seq.compreg.ce`) and input muxing. The first and last are what we have now.
Also changes to that the
clock-gate-regs
options actually implements clock gates instead of aseq.comp_reg.ce
operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (seq.compreg.ce
) and input muxing. The first and last are what we have now.