Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #479
The two main problems:
HoistPlaintextOps
usedstd::find_if
to scan the entire generic body, only to extract a single op, leading to a quadratic scan when there were a linear number of ops to hoist. When unrolling an affine.for, each iteration gets an affine.apply op, and they can all be lifted, so it nearly always causes quadratic behavior.extractOpBeforeGeneric
, the newly-created single-op generic was cloning all of the operands from the original generic op as its operands, even if it had zero secret operands. When lifting ~1k ops, this became a problem because the output of each single-op generic is added as an operand to the originalgeneric
it was lifted from, so we got into a situation like:After that, we have a quadratic number of unused args, so we apply a quadratic number of EraseUnusedGenericArg patterns. By more finely tracking the subset of operands needed in the new generic, we can reduce this to (# ops to lift * # operands per lifted op), which is minimal. The only tricky bit is that, if the op being extracted is a loop nest, you have to walk the op's subtree to find all the generic arguments it depends on.
The test I added ran in 15s with the bug still in place, and after these fixes it runs in 0.5s.