Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scheduling] Define problem to model operator chaining. #2233

Merged
merged 4 commits into from
Dec 20, 2021

Conversation

jopperm
Copy link
Contributor

@jopperm jopperm commented Nov 24, 2021

This PR defines a ChainingProblem, which models the accumulation of physical propagation delays on combinational paths along SSA dependences within a scheduling problem's abstract time steps/cycles.

For example, the modeling for the @multi_cycle test case from chaining-problems.mlir would look like this:

chaining

@jopperm
Copy link
Contributor Author

jopperm commented Nov 24, 2021

@stephenneuendorffer I'm curious to hear your thoughts on whether the modeling is sound and general enough. Also, is the terminology I chose OK?

@mikeurbach
Copy link
Contributor

I think the code looks nice, but haven't thought too hard about what this problem models. Is this intended to compose well with other Problems? Could we have a problem type that is both a ChainingProblem and a CyclicProblem?

@jopperm
Copy link
Contributor Author

jopperm commented Nov 24, 2021

Could we have a problem type that is both a ChainingProblem and a CyclicProblem?

Absolutely. I consider the currently available subclasses of Problem to be almost trait-like. A caveat is that composed problem classes will need to override the check() and verify() methods to include the property-specific validation methods from all superclasses (see ModuloProblem for an example) -- it would be super cool to automate/meta-program that, but I don't have a good idea to do so yet.

@mikeurbach
Copy link
Contributor

Thanks, that is what I figured, just wanted to be sure I understood.

Comment on lines +299 to +311
Optional<float> getIncomingDelay(OperatorType opr) {
return incomingDelay.lookup(opr);
}
void setIncomingDelay(OperatorType opr, float delay) {
incomingDelay[opr] = delay;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to model 2 things here:

  1. operations with different widths will have different delays. It's not clear to me how you intend to model this. will you have different operator types?
  2. operations with multiple inputs and outputs almost always have different delays and/or latencies. how do you plan to model this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, in the current design of the infra, operator types are completely unaware of the (MLIR) operations/types of the (scheduling problem's) operations they are attached to. So the client code has to set up different operator types for the different bitwidths.
  2. The problem model (or more accurately, the Dependence class) is prepared for that, but the currently existing Problems don't use operand/result indices yet. My plan was to introduce new trait-like subproblems in which the operator types carry per-input/output latencies and delays in separate PRs.

include/circt/Scheduling/Problems.h Outdated Show resolved Hide resolved
Comment on lines 187 to 194
return getContainingOp()->emitError()
<< "Incoming delay (" << zInc << ") for operator type '" << opr
<< "' exceeds cycle time (" << ct << ")";

if (zOut > ct)
return getContainingOp()->emitError()
<< "Outgoing delay (" << zOut << ") for operator type '" << opr
<< "' exceeds cycle time (" << ct << ")";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be fatal errors? It might be useful for these to not be fatal errors, but instead to 'do the best you can'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think these particular ones (in checkDelays) should be errors, because there's no value in trying to fit 10ns operators in 5ns cycles, for example. I'd consider that a bug in the client code constructing the problem.

Also, I'd prefer to keep letting the tests in verifyPrecedenceInCycle produce errors, because these hint at bugs in the scheduler.

I think the interesting design decision is in verifyStartTimeInCycle. Yes, the cycle time constraint is currently strict (hence the error), but we could introduce a soft constraint to support a 'do the best you can' mode like this:

prob.setDesiredCycleTime(5.0f);
prob.setMaximumCycleTime(6.0f);

We could support instances where at least one of these constraints is set. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, all setMaximumCycleTime() is doing is controlling the policy on emitting an error. I would decouple the issuing of an error from the scheduling problem. The critical thing here is that in many systems, the delays have some inaccuracy, so it is really up to the user of the infrastructure whether a failure to schedule is a fatal error. I would suggest having the infrastructure focus on collecting constraints that aren't satisfied and returning them to the caller and letting the caller figure out whether to issue an error.

I agree that this may be different if you're trying to catch errors in the scheduler. In those cases, you should indeed emit an error. There's a bit of a fine line here, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, makes sense. I wonder if I should remove the cycle time altogether from the ChainingProblem, and treat it solely as an input to the concrete scheduler (I plan to handle different objective functions in the same way). The problem's verifier would only guarantee the presence of start-times-in-cycle, and that these are consistent w.r.t. the problem's dependences.

This gives the client maximum flexibility in inspecting & accepting the solution as needed. The client controls what it gets (strict adherence to cycle time, or best-effort) through its choice of algorithm.

Copy link
Contributor Author

@jopperm jopperm Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before this PR goes totally stale 😉: I think the ChainingProblem still makes sense without the cycle time constraint, and I just pushed the corresponding changes.

@mikeurbach, @stephenneuendorffer: would you be ok with this for now? I would then look into implementing some scheduler support for this problem, and test everything out in @rachitnigam's use-case in Calyx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good place to start off work for the Calyx use-case which would involve breaking up large calyx.comb_group blocks into separate calyx.group which perform combinational computations and store them into registers. It would be interesting to see how we can incorporate Problem instances directly into the transformation over Calyx programs and generate new groups.

Copy link
Contributor

@mikeurbach mikeurbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to have something to start with, and we can revisit the topics in this initial PR as they come up.

@jopperm jopperm merged commit 5bfb9ae into llvm:main Dec 20, 2021
@jopperm jopperm deleted the sched-chaining-problem branch December 20, 2021 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants