Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline] Add per-register clock gating #5489

Merged
merged 3 commits into from
Jul 3, 2023

Conversation

mortbopet
Copy link
Contributor

Also changes to that the clock-gate-regs options actually implements clock gates instead of a seq.comp_reg.ce operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (seq.compreg.ce) and input muxing. The first and last are what we have now.

@mortbopet mortbopet requested a review from teqdruid June 27, 2023 14:22
}
odsState.addOperands(sortedClockGates);
odsState.addAttribute(
"operand_segment_sizes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, what is operand_segment_sizes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mortbopet mortbopet force-pushed the dev/mpetersen/pipeline_clock_gates branch from 52ff231 to 8ca5e8e Compare June 28, 2023 08:01
Comment on lines +226 to +228
Variadic<I1>:$clockGates,
I64ArrayAttr:$clockGatesPerRegister);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets hairy fast. Are we absolutely sure there's no way to represent this with ops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do Variadic<Variadic<I1>> which would be the obvious one. So something has to break down that variadic list of operands. I chose I64ArrayAttr + a bit more code due to ease of implementation - open for suggestions 🤷.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily suggesting this -- and certainly not for this PR -- but would it make sense to to break up this op into multiple ops? Like a "stall controller" which outputs per register stalls then clock gate ops could be inserted before this op... That's my only half-formed idea.

include/circt/Dialect/Pipeline/PipelineOps.td Outdated Show resolved Hide resolved
OpBuilder<(ins "Block*":$dest, "ValueRange":$registers, "ValueRange":$passthroughs)>,
// Clock gate builder, which takes a mapping between the registers and
// and their clock gate hierarchy.
OpBuilder<(ins "Block*":$dest, "ValueRange":$registers, "ValueRange":$passthroughs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add a builder which associates the registers with the clock gates with a struct? (i.e. struct {Value reg; vector clockGates;}.)

Copy link
Contributor Author

@mortbopet mortbopet Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could... but the highlighted builder isn't even being used (as your pointed out later in the review👀) - will remove this builder instead of adding more unused things.

stageTerminator->getLoc(), dataType, regIn, clock, dataValid,
regName, reset, /*resetValue*/ Value(), /*sym_name*/ StringAttr());
// Use the clock gate instead of input muxing.
Value currClockGate = notStalledClockGate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakep-msft does it make sense to also gate on valid?

}
dataReg = builder.create<seq::CompRegOp>(
stageTerminator->getLoc(), dataType, regIn, currClockGate, regName,
reset, /*resetValue*/ Value(), /*sym_name*/ StringAttr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was there previously, but (at least on FPGAs) it's generally preferable not to reset data registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - i'll remove it. Can always add an option for ASIC (resets) in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't see a reason to do it on ASICs, though I don't have any experience there.

auto dataRegNext = builder.create<comb::MuxOp>(
stageTerminator->getLoc(), dataValid, regIn, dataRegBE);
Value imuxSelect = stageValidAndNotStalled;
if (auto clockGates = stageTerminator.getClockGatesForReg(regIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have the clock gate regs enabled, why wouldn't we just ignore the clock gates? They're just power optimizations. If someone is using them for correctness, they're mis-using them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point - i guess we'll never end up with a more optimal design if including the "clock gating" in the input muxing hierarchy.

clockGates));
}

Value dataRegNext = builder.create<comb::MuxOp>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was pre-existing, but why not use compreg.ce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A matter of representation. Opted for an input mux rather than a clock enable signal (this is how I was schooled in "proper" instantiation of registers). Both styles appear to synthesize to the same thing. For brevity, in a compiler, i'd prefer the compreg.ce style - will change.

lib/Dialect/Pipeline/PipelineOps.cpp Outdated Show resolved Hide resolved
lib/Dialect/Pipeline/PipelineOps.cpp Outdated Show resolved Hide resolved
Also changes to that the `clock-gate-regs` options actually implements clock gates instead of a `seq.comp_reg.ce` operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (`seq.compreg.ce`) and input muxing. The first and last are what we have now.
@mortbopet mortbopet force-pushed the dev/mpetersen/pipeline_clock_gates branch from 9331628 to b1448c2 Compare June 30, 2023 09:38
@mortbopet mortbopet merged commit 5d69c08 into main Jul 3, 2023
5 checks passed
calebmkim pushed a commit to andrewb1999/circt that referenced this pull request Jul 12, 2023
Also changes to that the `clock-gate-regs` options actually implements clock gates instead of a `seq.comp_reg.ce` operation. To cover all cases, i think there needs to be three kinds of gating implementations - clock gate, clock enable (`seq.compreg.ce`) and input muxing. The first and last are what we have now.
@darthscsi darthscsi deleted the dev/mpetersen/pipeline_clock_gates branch June 4, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants