Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIRRTL][InferResets] Peel interesting operations paralelly, NFC #5629

Merged
merged 1 commit into from
Jul 20, 2023

Conversation

uenoku
Copy link
Member

@uenoku uenoku commented Jul 19, 2023

This commit reduces the execution time of InferResets by 30%~40%. Since it's costly to walk the entire IR before Dedup, it's low hanging fruit to peel interesting operations (operations with uninferred resets in this case) paralelly.

Before:
54.5713 ( 1.5%) 54.5713 ( 7.8%) InferResets
After:
34.2072 ( 0.9%) 34.2072 ( 5.1%) InferResets

This commit reduce executin time of InferReests by 30%~40%. Since it's
costly to walk the entire IR before Dedup, it's low hanging fruit to peel
interesting operations (operations with uninferred resets in this case) parallely.

Before:
   54.5713 (  1.5%)   54.5713 (  7.8%)    InferResets
After:
   34.2072 (  0.9%)   34.2072 (  5.1%)    InferResets
op->getResultTypes(),
[](mlir::Type type) { return typeContainsReset(type); }) ||
llvm::any_of(op->getOperandTypes(), typeContainsReset))
e.second.push_back(op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it thread safe to update the vector moduleToOps in parallel ? If it is, then this is great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's thread safe because moduleToOps[i] is mutated only by i-th iteration. That's why moduleToOps is constructed beforehand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe. Another option here since the result set is just aggregated is to use a parallel reduce. This is more efficient though, just slightly less clear (perhaps).

Copy link
Contributor

@darthscsi darthscsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 2 things for possible future work:

  1. the walk IR and collect ops to worklist is a common pattern which we should factor out.
  2. the traceResets might be parallizable with a reduction over all the partial (per-module) results (thus parallelizing the main loop rather than the identification loop)

if (typeContainsReset(e.type))
return true;
return false;
static bool typeContainsReset(Type type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to peel this off and commit it directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

op->getResultTypes(),
[](mlir::Type type) { return typeContainsReset(type); }) ||
llvm::any_of(op->getOperandTypes(), typeContainsReset))
e.second.push_back(op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be safe. Another option here since the result set is just aggregated is to use a parallel reduce. This is more efficient though, just slightly less clear (perhaps).

Copy link
Contributor

@fabianschuiki fabianschuiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Very cool to see that significant speedup.

For some of these inference passes I'm wondering if there's a mode where we run the module-local part of the pass in parallel (e.g. figuring out the reset networks within a module), and then do a global pass across the modules afterwards. Not sure that will help much here because you're already getting a lot of mileage out of the parallelization 😎

if (typeContainsReset(e.type))
return true;
return false;
static bool typeContainsReset(Type type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@uenoku
Copy link
Member Author

uenoku commented Jul 20, 2023

Yeah I think it does makes sense to accumulate module-scope equivalence relation paralelly and apply reductions. I originally went in that direction, but I changed to the current simpler implementation as the number of interesting operations is very small (there are 2*10^5 interesting ops out of 10^8 ops) so I guess the performance benefit might not be that much.

@uenoku uenoku merged commit c0b0e6c into main Jul 20, 2023
@uenoku uenoku deleted the infer-reset-sped branch July 20, 2023 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants