Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimization] Specialized quantification instruction #577

Merged
merged 45 commits into from
Aug 3, 2022

Conversation

rctcwyvrn
Copy link
Contributor

@rctcwyvrn rctcwyvrn commented Jul 14, 2022

The idea is to have an optimized quantification instruction that fuses match + save + loop into one instruction

  • Fused instructions are usually good optimizations
  • This allows us to compact the save point representation of a quantified non-capturing single grapheme match into a range of indices to restore to
  • By moving the hottest paths out of Processor.cycle() this will hopefully help with the instruction layout work that is soon to come. Some instructions like splitSaving or condBranchIfZeroElseDecrement might become much less common if most quantifications end up just being one quantify instruction

The plan: Compile the most common quantification cases like .*, \w*, [a-z0-9]* and a* into this fused instruction

Results:

Comparing against benchmark result file speedy_builtins.json
=== Regressions ======================================================================
- NumbersAll                              10.5ms	9.14ms	1.4ms		15.3%
=== Improvements =====================================================================
- EmailLookaheadAll                       37.3ms	86.2ms	-48.8ms		-56.7%
- EmailLookaheadNoMatchesAll              39ms	60.9ms	-21.9ms		-36.0%
- EmailLookaheadList                      9.2ms	23.6ms	-14.4ms		-61.0%
- CompilerMessagesAll                     114ms	125ms	-11.5ms		-9.2%
- InvertedCCC                             22.5ms	28.7ms	-6.19ms		-21.6%
- EagarQuantWithTerminalWhole             2.4ms	7.58ms	-5.19ms		-68.4%
- IPv6Address                             4.06ms	7.74ms	-3.68ms		-47.6%
- LinesAll                                3.09ms	6.57ms	-3.47ms		-52.9%
- GraphemeBreakNoCapAll                   6.74ms	10.1ms	-3.4ms		-33.5%
- WordsAll                                22.4ms	24.3ms	-1.94ms		-8.0%
- DiceRollsInTextAll                      60.6ms	62.2ms	-1.55ms		-2.5%
- EmailRFCAll                             49.8ms	51ms	-1.26ms		-2.5%
- CaseInsensitiveCCC                      11.2ms	12.4ms	-1.14ms		-9.2%
- symDiffCCC                              40.1ms	41.1ms	-951µs		-2.3%
- AnchoredNotFoundWhole                   8.66ms	9.44ms	-789µs		-8.4%
- BasicRangeCCC                           10.6ms	11.3ms	-723µs		-6.4%
- EmojiRegexAll                           70.7ms	71.4ms	-677µs		-0.9%
- DiceNotation                            6.93ms	7.55ms	-622µs		-8.2%
- BasicCCC                                10.1ms	10.7ms	-531µs		-5.0%
- MACAddress                              3.11ms	3.54ms	-431µs		-12.2%
- SubtractionCCC                          15.5ms	15.9ms	-397µs		-2.5%
- CssAll                                  3.87ms	4.19ms	-324µs		-7.7%
  • Large improvements in quantifications that consume more characters (LinesAll, EmailLookahead, EagerQuantWithTerminalWhole) as expected
  • Only small improvements from the compact save point representation (AnchoredNotFoundWhole) due to being killed by ARC in signalFailure()
  • Regression in quantifications that consume few characters/usually match nothing (NumbersAll) due to the increased overhead

Generally good improvements but if we could speed up signalFailure() some of these benchmarks would be much faster, there were many cases of the quantification part of the regex getting faster but being much slower in signalFailure() due to a combination of ARC and having to call index(before:)

Note: based on top of #547

@rctcwyvrn rctcwyvrn changed the title [WIP] [Optimization] Specialized quantification instruction [Optimization] Specialized quantification instruction Jul 20, 2022
@rctcwyvrn rctcwyvrn marked this pull request as ready for review July 20, 2022 00:40
@rctcwyvrn rctcwyvrn requested a review from milseman July 20, 2022 00:46
// which we then signalFailure if nil or currentPosition = next otherwise
// This would have the benefit of potentially allowing us to not duplicate
// code between the normal matching instructions and this loop here
var next: Input.Index?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work: Do we want to rework our Processor.Cycle() switch loop to do something like this for all of the matching instructions?
ie: A bunch of _doMatchThing functions that return Input.Index? which we then signalFailure if nil or currentPosition = next otherwise
This would have the benefit of potentially allowing us to not duplicate code between the normal matching instructions and this switch here

/// the quantified cases
///
/// Essentially we trade off implementation complexity for runtime speed by adding more true cases to this
func shouldDoFastQuant(_ opts: MatchingOptions) -> Bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  // Future work: Should we allow ConsumeFunctions into .quantify?
  // This would open up non-ascii custom character classes as well as the
  // possibility of wrapping weirder cases into consume functions
  // (allowing us to .quantify anything we want, but increasing our reliance on ConsumerInterface)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still be limited to knowing that it only consumes a single character at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, in that case we'd need a separate runQuantify for consumers that would emit save points the normal way

@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but can you make sure reluctant quantifications is well tested in the new builtins?

Sources/_StringProcessing/ByteCodeGen.swift Outdated Show resolved Hide resolved
/// the quantified cases
///
/// Essentially we trade off implementation complexity for runtime speed by adding more true cases to this
func shouldDoFastQuant(_ opts: MatchingOptions) -> Bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we still be limited to knowing that it only consumes a single character at a time?

Sources/_StringProcessing/Engine/InstPayload.swift Outdated Show resolved Hide resolved
guard let idx = next else {
return true // matched zero times
}
if payload.quantKind != .possessive {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these work for reluctant? Possessive also never backtracks, so I wonder if (future work) we should consider it entirely separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reluctant quantification isn't allowed into .quantify because it never loops inside it like the other two so it didn't make sense to add into .quantify

In the future we could have a specialized reluctant quantifier instruction that takes in a both the quantification and the match instruction after it, which would handle cases like .*?;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some asserts then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I had some before but they got lost in the other changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work: It should be possible to optimize the pattern of reluctant quantifier + anchor character in a similar style by having a payload of two int registers that store the respective payloads for the quantification and the anchor.

The reason why we can't do this peephole optimization now is that we're stuck with the tree representation so there isn't a good way of determining if the reluctant quantifier is followed by an anchor

- Make emitFastQuant failable and move the checks into it
- Add assertions for .reluctant
- Change some static lets to static vars
@rctcwyvrn
Copy link
Contributor Author

@swift-ci test

@rctcwyvrn rctcwyvrn merged commit 1acca94 into swiftlang:main Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants