Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply folders before canonicalize in heir-simd-vectorizer #601

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

j2kun
Copy link
Collaborator

@j2kun j2kun commented Apr 6, 2024

This new pass gives an empty set of patterns to the greedy pattern rewrite engine, which ends up applying each op's folding routine, which is enough to enable us to handle all examples in heir_simd_vectorizer

Avoids the slowdown mentioned in #586

@j2kun
Copy link
Collaborator Author

j2kun commented Apr 6, 2024

Looks like the change resulted in two additional rotations being added to the box_blur_64x64 IR, so I will investigate that on Monday

@j2kun j2kun force-pushed the fold-only branch 4 times, most recently from 53fd239 to a3c90f4 Compare April 8, 2024 17:57
@j2kun j2kun changed the title Replace canonicalize with post-unroll-simplify Apply folders before canonicalize in heir-simd-vectorizer Apr 8, 2024
@j2kun
Copy link
Collaborator Author

j2kun commented Apr 8, 2024

I didn't find the source of the additional inserted rotations, but instead added some extra tensor_ext canonicalization patterns that restored the original behavior. Also included some minor cleanup discovered along the way.

Copy link
Collaborator

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! Is there any downside in just running (void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns)); with empty patterns in InsertRotate before adding the other patterns and running the actual pass? (This would avoid the pass boilerplate, and other passes that need this as a pre-pass can do the same thing)

@j2kun
Copy link
Collaborator Author

j2kun commented Apr 9, 2024

Nice fix! Is there any downside in just running (void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns)); with empty patterns in InsertRotate before adding the other patterns and running the actual pass? (This would avoid the pass boilerplate, and other passes that need this as a pre-pass can do the same thing)

The reason I didn't do this is because I wanted to try putting canonicalize in between this new pass and insert-rotate. I suppose I could put it at the end of the loop unroll pass? I just think having it more visible will make it less surprising.

This new pass gives an empty set of patterns to the greedy pattern
rewrite engine, which ends up applying each op's folding routine,
which simplifies the IR enough to make a normal canonicalize pass
fast.

However, this reduces some of the optimality of the final IR for some
tests, via inserting additional rotations that are not necessary.  So I
added a few additional canonicalization patterns to tensor_ext that
restore the original behavior.
@j2kun j2kun added the pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing label Apr 10, 2024
@asraa
Copy link
Collaborator

asraa commented Apr 11, 2024

I wanted to try putting canonicalize in between this new pass and insert-rotate. I suppose I could put it at the end of the loop unroll pass?

Ohhh, I see. Nah, putting it at the end of loop unroll seems like it'd be a surprise, I'd expect it to be applied as a pre-condition in the pass it's needed.

@copybara-service copybara-service bot merged commit c9af3be into google:main Apr 11, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull_ready Indicates whether a PR is ready to pull. The copybara worker will import for internal testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants