-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIRRTL][InferResets] Peel interesting operations paralelly, NFC #5629
Conversation
This commit reduce executin time of InferReests by 30%~40%. Since it's costly to walk the entire IR before Dedup, it's low hanging fruit to peel interesting operations (operations with uninferred resets in this case) parallely. Before: 54.5713 ( 1.5%) 54.5713 ( 7.8%) InferResets After: 34.2072 ( 0.9%) 34.2072 ( 5.1%) InferResets
op->getResultTypes(), | ||
[](mlir::Type type) { return typeContainsReset(type); }) || | ||
llvm::any_of(op->getOperandTypes(), typeContainsReset)) | ||
e.second.push_back(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it thread safe to update the vector moduleToOps
in parallel ? If it is, then this is great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's thread safe because moduleToOps[i]
is mutated only by i-th iteration. That's why moduleToOps
is constructed beforehand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be safe. Another option here since the result set is just aggregated is to use a parallel reduce. This is more efficient though, just slightly less clear (perhaps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 2 things for possible future work:
- the walk IR and collect ops to worklist is a common pattern which we should factor out.
- the traceResets might be parallizable with a reduction over all the partial (per-module) results (thus parallelizing the main loop rather than the identification loop)
if (typeContainsReset(e.type)) | ||
return true; | ||
return false; | ||
static bool typeContainsReset(Type type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to peel this off and commit it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
op->getResultTypes(), | ||
[](mlir::Type type) { return typeContainsReset(type); }) || | ||
llvm::any_of(op->getOperandTypes(), typeContainsReset)) | ||
e.second.push_back(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be safe. Another option here since the result set is just aggregated is to use a parallel reduce. This is more efficient though, just slightly less clear (perhaps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Very cool to see that significant speedup.
For some of these inference passes I'm wondering if there's a mode where we run the module-local part of the pass in parallel (e.g. figuring out the reset networks within a module), and then do a global pass across the modules afterwards. Not sure that will help much here because you're already getting a lot of mileage out of the parallelization 😎
if (typeContainsReset(e.type)) | ||
return true; | ||
return false; | ||
static bool typeContainsReset(Type type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Yeah I think it does makes sense to accumulate module-scope equivalence relation paralelly and apply reductions. I originally went in that direction, but I changed to the current simpler implementation as the number of interesting operations is very small (there are 2*10^5 interesting ops out of 10^8 ops) so I guess the performance benefit might not be that much. |
This commit reduces the execution time of InferResets by 30%~40%. Since it's costly to walk the entire IR before Dedup, it's low hanging fruit to peel interesting operations (operations with uninferred resets in this case) paralelly.
Before:
54.5713 ( 1.5%) 54.5713 ( 7.8%) InferResets
After:
34.2072 ( 0.9%) 34.2072 ( 5.1%) InferResets