-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply folders before canonicalize in heir-simd-vectorizer #601
Conversation
Looks like the change resulted in two additional rotations being added to the box_blur_64x64 IR, so I will investigate that on Monday |
53fd239
to
a3c90f4
Compare
I didn't find the source of the additional inserted rotations, but instead added some extra tensor_ext canonicalization patterns that restored the original behavior. Also included some minor cleanup discovered along the way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix! Is there any downside in just running (void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));
with empty patterns in InsertRotate before adding the other patterns and running the actual pass? (This would avoid the pass boilerplate, and other passes that need this as a pre-pass can do the same thing)
The reason I didn't do this is because I wanted to try putting |
This new pass gives an empty set of patterns to the greedy pattern rewrite engine, which ends up applying each op's folding routine, which simplifies the IR enough to make a normal canonicalize pass fast. However, this reduces some of the optimality of the final IR for some tests, via inserting additional rotations that are not necessary. So I added a few additional canonicalization patterns to tensor_ext that restore the original behavior.
Ohhh, I see. Nah, putting it at the end of loop unroll seems like it'd be a surprise, I'd expect it to be applied as a pre-condition in the pass it's needed. |
This new pass gives an empty set of patterns to the greedy pattern rewrite engine, which ends up applying each op's folding routine, which is enough to enable us to handle all examples in heir_simd_vectorizer
Avoids the slowdown mentioned in #586