[Optimization] Specialized quantification instruction #577

rctcwyvrn · 2022-07-14T01:08:01Z

The idea is to have an optimized quantification instruction that fuses match + save + loop into one instruction

Fused instructions are usually good optimizations
This allows us to compact the save point representation of a quantified non-capturing single grapheme match into a range of indices to restore to
By moving the hottest paths out of Processor.cycle() this will hopefully help with the instruction layout work that is soon to come. Some instructions like splitSaving or condBranchIfZeroElseDecrement might become much less common if most quantifications end up just being one quantify instruction

The plan: Compile the most common quantification cases like .*, \w*, [a-z0-9]* and a* into this fused instruction

Results:

Comparing against benchmark result file speedy_builtins.json
=== Regressions ======================================================================
- NumbersAll                              10.5ms	9.14ms	1.4ms		15.3%
=== Improvements =====================================================================
- EmailLookaheadAll                       37.3ms	86.2ms	-48.8ms		-56.7%
- EmailLookaheadNoMatchesAll              39ms	60.9ms	-21.9ms		-36.0%
- EmailLookaheadList                      9.2ms	23.6ms	-14.4ms		-61.0%
- CompilerMessagesAll                     114ms	125ms	-11.5ms		-9.2%
- InvertedCCC                             22.5ms	28.7ms	-6.19ms		-21.6%
- EagarQuantWithTerminalWhole             2.4ms	7.58ms	-5.19ms		-68.4%
- IPv6Address                             4.06ms	7.74ms	-3.68ms		-47.6%
- LinesAll                                3.09ms	6.57ms	-3.47ms		-52.9%
- GraphemeBreakNoCapAll                   6.74ms	10.1ms	-3.4ms		-33.5%
- WordsAll                                22.4ms	24.3ms	-1.94ms		-8.0%
- DiceRollsInTextAll                      60.6ms	62.2ms	-1.55ms		-2.5%
- EmailRFCAll                             49.8ms	51ms	-1.26ms		-2.5%
- CaseInsensitiveCCC                      11.2ms	12.4ms	-1.14ms		-9.2%
- symDiffCCC                              40.1ms	41.1ms	-951µs		-2.3%
- AnchoredNotFoundWhole                   8.66ms	9.44ms	-789µs		-8.4%
- BasicRangeCCC                           10.6ms	11.3ms	-723µs		-6.4%
- EmojiRegexAll                           70.7ms	71.4ms	-677µs		-0.9%
- DiceNotation                            6.93ms	7.55ms	-622µs		-8.2%
- BasicCCC                                10.1ms	10.7ms	-531µs		-5.0%
- MACAddress                              3.11ms	3.54ms	-431µs		-12.2%
- SubtractionCCC                          15.5ms	15.9ms	-397µs		-2.5%
- CssAll                                  3.87ms	4.19ms	-324µs		-7.7%

Large improvements in quantifications that consume more characters (LinesAll, EmailLookahead, EagerQuantWithTerminalWhole) as expected
Only small improvements from the compact save point representation (AnchoredNotFoundWhole) due to being killed by ARC in signalFailure()
Regression in quantifications that consume few characters/usually match nothing (NumbersAll) due to the increased overhead

Generally good improvements but if we could speed up signalFailure() some of these benchmarks would be much faster, there were many cases of the quantification part of the regex getting faster but being much slower in signalFailure() due to a combination of ARC and having to call index(before:)

Note: based on top of #547

- matchBuiltin always fails if at endIndex - fix switch in isStrictAscii

rctcwyvrn · 2022-07-20T01:05:20Z

Sources/_StringProcessing/Engine/MEQuantify.swift

+    // which we then signalFailure if nil or currentPosition = next otherwise
+    // This would have the benefit of potentially allowing us to not duplicate
+    // code between the normal matching instructions and this loop here
+    var next: Input.Index?


Future work: Do we want to rework our Processor.Cycle() switch loop to do something like this for all of the matching instructions?
ie: A bunch of _doMatchThing functions that return Input.Index? which we then signalFailure if nil or currentPosition = next otherwise
This would have the benefit of potentially allowing us to not duplicate code between the normal matching instructions and this switch here

rctcwyvrn · 2022-07-20T01:07:18Z

Sources/_StringProcessing/ByteCodeGen.swift

+  /// the quantified cases
+  ///
+  /// Essentially we trade off implementation complexity for runtime speed by adding more true cases to this
+  func shouldDoFastQuant(_ opts: MatchingOptions) -> Bool {


// Future work: Should we allow ConsumeFunctions into .quantify? // This would open up non-ascii custom character classes as well as the // possibility of wrapping weirder cases into consume functions // (allowing us to .quantify anything we want, but increasing our reliance on ConsumerInterface)

Would we still be limited to knowing that it only consumes a single character at a time?

Ah right, in that case we'd need a separate runQuantify for consumers that would emit save points the normal way

rctcwyvrn · 2022-07-20T01:08:18Z

@swift-ci test

milseman

Overall LGTM, but can you make sure reluctant quantifications is well tested in the new builtins?

Sources/_StringProcessing/ByteCodeGen.swift

milseman · 2022-07-27T22:48:53Z

Sources/_StringProcessing/ByteCodeGen.swift

+  /// the quantified cases
+  ///
+  /// Essentially we trade off implementation complexity for runtime speed by adding more true cases to this
+  func shouldDoFastQuant(_ opts: MatchingOptions) -> Bool {


Would we still be limited to knowing that it only consumes a single character at a time?

Sources/_StringProcessing/Engine/InstPayload.swift

milseman · 2022-07-27T23:14:59Z

Sources/_StringProcessing/Engine/MEQuantify.swift

+    guard let idx = next else {
+      return true // matched zero times
+    }
+    if payload.quantKind != .possessive {


Do these work for reluctant? Possessive also never backtracks, so I wonder if (future work) we should consider it entirely separately.

Reluctant quantification isn't allowed into .quantify because it never loops inside it like the other two so it didn't make sense to add into .quantify

In the future we could have a specialized reluctant quantifier instruction that takes in a both the quantification and the match instruction after it, which would handle cases like .*?;

Can you add some asserts then?

Oops, I had some before but they got lost in the other changes

Future work: It should be possible to optimize the pattern of reluctant quantifier + anchor character in a similar style by having a payload of two int registers that store the respective payloads for the quantification and the anchor.

The reason why we can't do this peephole optimization now is that we're stuck with the tree representation so there isn't a good way of determining if the reluctant quantifier is followed by an anchor

- Make emitFastQuant failable and move the checks into it - Add assertions for .reluctant - Change some static lets to static vars

rctcwyvrn · 2022-08-03T20:27:41Z

@swift-ci test

rctcwyvrn added 30 commits July 5, 2022 14:21

Copy over new ascii bitset

3b6b676

Add matchBuiltin

33caa79

Remove debug prints

139daa5

Remove bitset fast path

9abf4af

Fully remove remnants of the bitset fast path

286f5d8

Merge branch 'main' into speedy-builtins

9e915cd

Completely replace AssertionFunction with regexAssert(by:)

e593ddb

Merge branch 'main' into speedy-builtins

25dc277

Cleanup

3e38ac6

Move match builtin and assert + Add AssertionPayload

e5d8b4a

Cleanup assertions

0466c25

Merge branch 'main' into speedy-builtins

87078ad

First version

1ef91f3

Fix tests

f401e84

Update opcode description for assertBy

b09f45f

Bugfixes

00ae70b

Merge branch 'speedy-builtins' into quicker-quant-qualifies-quality

d33b57c

Finish bugfixes

62bec3f

Fixed array copy issue with savepoints

8cf6b21

Add assertions + cleanup

8d61e7d

Clean up loop structure in runQuantify

c0bc139

Use range based save points

42e5a58

Undo the change where I made it recompute index after for some reason

b2dedac

More cleanup

7b4eaff

Merge branch 'main' into speedy-builtins

c581ea2

Merge branch 'speedy-builtins' into quicker-quant-qualifies-quality

627c982

Merge branch 'main' into speedy-builtins

2a82231

Update branch to match main

fb1576a

Use the newly cleaned up _CharacterClassModel

3b9485e

Add characterClass DSLTree node

64d1ed9

rctcwyvrn added 11 commits July 18, 2022 18:00

Bugfixes

2a6fe3c

- matchBuiltin always fails if at endIndex - fix switch in isStrictAscii

Merge branch 'main' into quicker-quant-qualifies-quality

2c2406e

Merge branch 'speedy-builtins' into quicker-quant-qualifies-quality

9352821

Allow strict and inverted character classes

9ed9f57

Cleanup magic constants

bee167f

Add specialized quantify paths

cf01751

Fix dot quantify

e2f60d3

Remove unneeded save point

d7015ec

experimental signal failure restoring

b417434

Reduce ARCs in signalFailure()

512bef5

Just do things inline in signalFailure()

ff9c375

rctcwyvrn changed the title ~~[WIP] [Optimization] Specialized quantification instruction~~ [Optimization] Specialized quantification instruction Jul 20, 2022

rctcwyvrn marked this pull request as ready for review July 20, 2022 00:40

rctcwyvrn requested a review from milseman July 20, 2022 00:46

rctcwyvrn commented Jul 20, 2022

View reviewed changes

Cleanup some comments

b026402

milseman approved these changes Jul 27, 2022

View reviewed changes

Cleanup

d02c5cd

- Make emitFastQuant failable and move the checks into it - Add assertions for .reluctant - Change some static lets to static vars

milseman approved these changes Aug 1, 2022

View reviewed changes

rctcwyvrn added 2 commits August 3, 2022 13:24

Merge branch 'main' into quicker-quant-qualifies-quality

ce90ba9

Slight cleanup

f90f01c

rctcwyvrn merged commit 1acca94 into swiftlang:main Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] Specialized quantification instruction #577

[Optimization] Specialized quantification instruction #577

rctcwyvrn commented Jul 14, 2022 •

edited

Loading

rctcwyvrn Jul 20, 2022

rctcwyvrn Jul 20, 2022

milseman Jul 27, 2022

rctcwyvrn Jul 29, 2022

rctcwyvrn commented Jul 20, 2022

milseman left a comment

milseman Jul 27, 2022

milseman Jul 27, 2022

rctcwyvrn Jul 27, 2022

milseman Jul 28, 2022

rctcwyvrn Jul 29, 2022

rctcwyvrn Jul 29, 2022

rctcwyvrn commented Aug 3, 2022

[Optimization] Specialized quantification instruction #577

[Optimization] Specialized quantification instruction #577

Conversation

rctcwyvrn commented Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rctcwyvrn commented Jul 20, 2022

milseman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rctcwyvrn commented Aug 3, 2022

rctcwyvrn commented Jul 14, 2022 •

edited

Loading