Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline] Refactor pipeline dialect to be block-based #5332

Merged
merged 4 commits into from
Jun 12, 2023

Conversation

mortbopet
Copy link
Contributor

@mortbopet mortbopet commented Jun 7, 2023

This commit refactors the pipeline dialect to be block-based. This brings a major representational change in the form of:

  1. The pipeline is no longer defined by a lexical ordering of operations and insertion of pipeline.stagesep operations to separate stages. Instead, pipeline stages are defined by blocks.
  2. Control flow between blocks are defined by pipeline.stage operations.
  3. Like in the current version, the pipeline can exist in register dematerialized and materialized forms. In the dematerialized form, stages (Blocks) have no arguments. In the materialized form, stages have arguments.
  4. It is the pipeline.stage operations which infers whether to register a value or pass it directly (i.e. a wire) to the next stage.
  5. Two top level operations exists:
  • pipeline.unscheduled: Unscheduled pipeline, essentially just a container of operations.
  • pipeline.scheduled: Scheduled pipeline, containing pipeline stages.

The motivation for this change is to improve the hierarchy of the IR, instead of relying on lexical ordering. This change also allows for more natural traversal of stages (Blocks), as well as dataflow analysis of the pipeline, which now is analogous to control flow analysis. The only slight drawback of this change is that it slightly complicates adding new pipeline stages, seeing as one has to explicitly update the control flow of the pipeline. This is a minor drawback, seeing as this is also how things work in the software world, and is easily addressed by helper methods.

Likewise, this change also removes the (old) pipeline.stage operations, which mainly were introduced to facilitate lowering. This is no longer needed with the block-based pipeline, seeing as stage in- and outputs are clearly denoted by block inputs and pipeline.stage operations.

Old representation:

%out = pipeline.pipeline(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> (i32) {
  ^bb0(%a0 : i32, %a1: i32, %g : i1):
    %add0 = comb.add %a0, %a1 : i32
    %add1 = comb.add %add0, %a0 : i32
    %add2 = comb.add %add1, %add0 : i32
    pipeline.return %add2 valid %s1_valid : i32
}

// Schedules to
%out = pipeline.pipeline(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> (i32) {
^bb0(%a0 : i32, %a1: i32, %g : i1):
  %add0 = comb.add %a0, %a1 : i32

  %s0_valid = pipeline.stagesep enable %g
  %add1 = comb.add %add0, %a0 : i32 // %a0 is a block argument fed through a stage.

  %s1_valid = pipeline.stagesep enable %s0_valid
  %add2 = comb.add %add1, %add0 : i32 // %add0 crosses multiple stages.

  pipeline.return %add2 valid %s1_valid : i32
}

// materializes to
%0 = pipeline.pipeline(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> i32 {
^bb0(%a0: i32, %a1: i32, %g: i1):
  %1 = comb.add %a0, %a1 : i32

  %1_s0, %a0_s0, %valid = pipeline.stagesep.reg enable %g regs %1, %a0 : i32, i32
  %2 = comb.add %1_s0, %a0_s0 : i32

  %2_s1, %1_s1 %valid_3 = pipeline.stagesep.reg enable %valid regs %2, %1_s0 : i32, i32
  %3 = comb.add %2_s1, %1_s1 : i32 // %1 from the entry stage is chained through both stage 1 and 2.

  pipeline.return %3 valid %valid_3 : i32
}

// Lowers to
%0 = pipeline.pipeline(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> i32 {
^bb0(%a0: i32, %a1: i32, %arg2: i1):
  %outputs:2, %valid = pipeline.stage ins %a0, %a1 enable %g : (i32, i32, i1) -> (i32, i32) {
  ^bb0(%arg3: i32, %arg4: i32, %arg6: i1):
    %2 = comb.add %arg3, %arg4 : i32
    pipeline.stage.return regs %2, %arg3 valid %arg6 : (i32, i32)
  }
  %outputs_2:2, %valid_3 = pipeline.stage ins %outputs#0, %outputs#1 enable %valid : (i32, i32) -> (i32, i32) {
  ^bb0(%arg3: i32, %arg4: i32, %arg5: i1):
    %2 = comb.add %arg3, %arg4 : i32
    pipeline.stage.return regs %2, %arg3 valid %arg5 : (i32, i32)
  }
  %1 = comb.add %outputs_2#0, %outputs_2#1 : i32
  pipeline.return %1 valid %valid_3 : i32
}

New representation

%out = pipeline.unscheduled(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> (i32) {
  ^bb0(%a0 : i32, %a1: i32, %g : i1):
    %add0 = comb.add %a0, %a1 : i32
    %add1 = comb.add %add0, %a0 : i32
    %add2 = comb.add %add1, %add0 : i32
    pipeline.return %add2 valid %s1_valid : i32
}

// schedules to
%out = pipeline.scheduled(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> (i32) {
^bb0(%a0 : i32, %a1: i32, %go : i1):
  %add0 = comb.add %a0, %a1 : i32
  pipeline.stage ^bb1 enable %go

^bb1:
  %add1 = comb.add %add0, %a0 : i32 // %a0 is a block argument fed through a stage.
  pipeline.stage ^bb2 enable %go

^bb2:
  %add2 = comb.add %add1, %add0 : i32 // %add0 crosses multiple stages.
  pipeline.return %add2 enable %go : i32 // %go crosses multiple stages
}

// Materializes to
%0 = pipeline.scheduled(%arg0, %arg1, %go) clock %clk reset %rst : (i32, i32, i1) -> i32 {
^bb0(%a0: i32, %a1: i32, %go: i1):
  %1 = comb.add %a0, %a1 : i32
  pipeline.stage ^bb1 regs (%1, %a0, %go) pass () enable %go

^bb1(%1_s0 : i32, %a0_s0 : i32, %go_s0 : i1):
  %2 = comb.add %1_s0, %a0_s0 : i32
  pipeline.stage ^bb2 regs (%2, %1_s0, %go_s0) pass () enable %go_s0

^bb2(%2_s1 : i32, %1_s1 : i32, %go_s1 : i1):
  %3 = comb.add %2_s1, %1_s1 : i32 // %1 from the entry stage is chained through both stage 1 and 2.
  pipeline.return %3 valid %go_s1 : i32 // and likewise with %go
}

// which can be directly lowered to hardware

This commit refactors the pipeline dialect to be block-based. This
brings a major representational change in the form of:
1. The pipeline is no longer defined by a lexical ordering of operations
and insertion of `pipeline.stagesep` operations to separate stages.
Instead, pipeline stages are defined by blocks.
2. Control flow between blocks are defined by `pipeline.stage` operations.
3. Like in the current version, the pipeline can exist in register
dematerialized and materialized forms. In the dematerialized form, stages
(`Block`s) have no arguments. In the materialized form, stages have
arguments.
4. It is the `pipeline.stage` operations which infers whether to register
a value or pass it directly (i.e. a wire) to the next stage.

The motivation for this change is to improve the hierarchy of the IR,
instead of relying on lexical ordering. This change also allows for
more natural traversal of stages (`Block`s), as well as dataflow analysis
of the pipeline, which now is analogous to control flow analysis.
The only slight drawback of this change is that it slightly complicates
adding new pipeline stages, seeing as one has to explicitly update
the control flow of the pipeline. This is a minor drawback, seeing as
this is also how things work in the software world, and is easily addressed
by helper methods.

Likewise, this change also removes the (old) `pipeline.stage` operations,
which mainly were introduced to facilitate lowering. This is no
longer needed with the block-based pipeline, seeing as stage in- and
outputs are clearly denoted by block inputs and `pipeline.stage` operations.
@mortbopet mortbopet force-pushed the dev/mpetersen/refactor_pipeline branch from 7b88525 to d61fc0a Compare June 7, 2023 09:12
Copy link
Contributor

@mikeurbach mikeurbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the implementation but I think the representational change makes sense based on the previous discusssions.

Copy link
Contributor

@teqdruid teqdruid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make it to the stuff under 'lib', but had some high-level comments which I wanted you to see. May not get to the rest until tomorrow.

include/circt/Dialect/Pipeline/Pipeline.td Outdated Show resolved Hide resolved
// Returns the last stage in the pipeline.
Block* getLastStage();

// Adds a new stage to this pipeline. It is the users responsibility to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? "Adds" -> "Append" would be clearer if I assume correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say 'add' is the correct term here; append would imply that this stage logically comes after all other stages, which is not true. Ordering/placement of the stage within the pipeline is up to the user.

include/circt/Dialect/Pipeline/Pipeline.td Outdated Show resolved Hide resolved
include/circt/Dialect/Pipeline/Pipeline.td Outdated Show resolved Hide resolved
%1 = comb.add %a0, %a1 : i32
pipeline.stage ^bb1 regs (%1, %a0, %go) pass () enable %go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it critical to specify the next block? Can't we just use the lexical order? Would there ever be a use for anything but? I think not requiring that could be dangerous in that one could assume it (and use getStage(i) instead of getOrderedStage(i)) and be correct 99.9% of the time but that behavior is not guaranteed.

Copy link
Contributor Author

@mortbopet mortbopet Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally i'd prefer to use blocks with explicit terminator destinations, i.e. stages are a linked list. It makes it easier/safer to insert new stages into a pipeline, querying successor stages (Block::getSinglePredecessor, Block::getSuccessor,...) and I personally think lexical ordering really only is a benefit for human readability (which is not the goal of the IR). Stringing together stages with explicit next-stage destinations will be correct 100% of the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it's just so unintuitive to have block 3 be the 5th pipeline stage. I'm fine with the successor blocks being explicit as long as the verifier checks that it's explicitly pointing to the next block. Again, I'm very concerned about quiet/sleeper bugs like the one I discuss above. I'm also concerned about the readability of the asm output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stages are a linked list.

You realize that lists of blocks within a region are implemented as a literal linked list, yes? I know that's not your point, but I'm just sayin'...

querying successor stages (Block::getSinglePredecessor, Block::getSuccessor,...)

I'm not entirely certain how those are implemented, but I would assume through some sort of OpInterface (BranchOpInterface?) on the terminators which we should absolutely implement. I suspect there are some potentially useful data flow analysis upstream which use it.

Plus, there's always Block::getNextNode() and Block::getPrevNode().

Stringing together stages with explicit next-stage destinations will be correct 100% of the time.

Define "correct"... It's correct assuming the user strings it together correctly. If they do the intuitive thing and just add a block to the end, it won't be "correct". By the same token, using the numerical order is always "correct" assuming the user inserts the new block at the proper place. If they don't, it's easy to discover the mistake even after lowering to HW... In the case where a user just adds a block but doesn't modify the successors properly, I'd assume it just disappears when lowering to HW.

}];

let arguments = (ins Variadic<AnyType>:$registers, Variadic<AnyType>:$passthroughs, I1:$enable);
let successors = (successor AnySuccessor:$nextStage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in the rationale.

"bool",
"isLatencyInsensitive", (ins),
/*methodBody=*/"",
/*defaultImplementation=*/[{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is probably just the previous logic factored out, so this may not be relevant to this PR; but, what if some of the inputs/outputs are LI. In other words, what if it's mixed LI non-LI? Would it make sense to return false for both this and isLatencySensitive below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should in my mind be an illegal case - it's XOR for now. In general, all of this LI-interfaced stuff is extremely blurry to me and will have to be revised once we need it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's blurry for me as well. We might consider ripping it out. Add it back later on with dc.value interfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of that as well - for a follow-up PR, though!

Copy link
Contributor

@teqdruid teqdruid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make it to the stuff under 'lib', but had some high-level comments which I wanted you to see. May not get to the rest until tomorrow.

lib/Dialect/Pipeline/Transforms/ExplicitRegs.cpp Outdated Show resolved Hide resolved
%1 = comb.add %a0, %a1 : i32
pipeline.stage ^bb1 regs (%1, %a0, %go) pass () enable %go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stages are a linked list.

You realize that lists of blocks within a region are implemented as a literal linked list, yes? I know that's not your point, but I'm just sayin'...

querying successor stages (Block::getSinglePredecessor, Block::getSuccessor,...)

I'm not entirely certain how those are implemented, but I would assume through some sort of OpInterface (BranchOpInterface?) on the terminators which we should absolutely implement. I suspect there are some potentially useful data flow analysis upstream which use it.

Plus, there's always Block::getNextNode() and Block::getPrevNode().

Stringing together stages with explicit next-stage destinations will be correct 100% of the time.

Define "correct"... It's correct assuming the user strings it together correctly. If they do the intuitive thing and just add a block to the end, it won't be "correct". By the same token, using the numerical order is always "correct" assuming the user inserts the new block at the proper place. If they don't, it's easy to discover the mistake even after lowering to HW... In the case where a user just adds a block but doesn't modify the successors properly, I'd assume it just disappears when lowering to HW.

@mortbopet
Copy link
Contributor Author

I'm not entirely certain how those are implemented, but I would assume through some sort of OpInterface (BranchOpInterface?) on the terminators which we should absolutely implement. I suspect there are some potentially useful data flow analysis upstream which use it.

This comes "for free" when blocks are used as the successors part of an ODS definition.

Fair, correctness might not be the proper term. Regardless, I don't think lexical ordering is a compelling alternative when the current implementation has a lot more in common with CFGs, and thus may share analysis, traversal, ...

Copy link
Contributor

@teqdruid teqdruid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed the PipelineToHW.cpp changes.

I still disagreed with the block/stage ordering issue, but for the purposes of making forward progress I'll let it go.

lib/Dialect/Pipeline/PipelineOps.cpp Outdated Show resolved Hide resolved
lib/Dialect/Pipeline/Transforms/ExplicitRegs.cpp Outdated Show resolved Hide resolved
lib/Dialect/Pipeline/Transforms/ScheduleLinearPipeline.cpp Outdated Show resolved Hide resolved
lib/Conversion/PipelineToHW/PipelineToHW.cpp Show resolved Hide resolved
@mortbopet mortbopet merged commit d57b640 into main Jun 12, 2023
@darthscsi darthscsi deleted the dev/mpetersen/refactor_pipeline branch June 4, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants