-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[stdlib] Add benchmarks for Character's unicodeScalars view #10645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In local testing, these benchmark results change as follows due to f212fac: <details open> <summary>Regression (20)</summary> TEST | OLD | NEW | DELTA | SPEEDUP --- | --- | --- | --- | --- CharIteration_utf16_unicodeScalars | 33074 | 79510 | +140.4% | **0.42x** CharIteration_utf16_unicodeScalars_Backwards | 73826 | 115515 | +56.5% | **0.64x** </details> <details open> <summary>Improvement (43)</summary> TEST | OLD | NEW | DELTA | SPEEDUP --- | --- | --- | --- | --- CharIndexing_korean_unicodeScalars | 118425 | 10360 | -91.3% | **11.43x** CharIndexing_tweet_unicodeScalars_Backwards | 226366 | 19850 | -91.2% | **11.40x** CharIndexing_tweet_unicodeScalars | 240596 | 21178 | -91.2% | **11.36x** CharIndexing_japanese_unicodeScalars_Backwards | 136265 | 12032 | -91.2% | **11.33x** CharIndexing_chinese_unicodeScalars | 92226 | 8146 | -91.2% | **11.32x** CharIndexing_russian_unicodeScalars | 100908 | 8948 | -91.1% | **11.28x** CharIndexing_japanese_unicodeScalars | 145414 | 12895 | -91.1% | **11.28x** CharIndexing_ascii_unicodeScalars | 121438 | 10779 | -91.1% | **11.27x** CharIndexing_russian_unicodeScalars_Backwards | 94190 | 8367 | -91.1% | **11.26x** CharIndexing_korean_unicodeScalars_Backwards | 110175 | 9803 | -91.1% | **11.24x** CharIndexing_chinese_unicodeScalars_Backwards | 86479 | 7715 | -91.1% | **11.21x** CharIndexing_ascii_unicodeScalars_Backwards | 113255 | 10132 | -91.1% | **11.18x** CharIteration_ascii_unicodeScalars_Backwards | 114873 | 13701 | -88.1% | **8.38x** CharIteration_chinese_unicodeScalars_Backwards | 85778 | 10411 | -87.9% | **8.24x** CharIteration_russian_unicodeScalars_Backwards | 94504 | 11471 | -87.9% | **8.24x** CharIteration_japanese_unicodeScalars_Backwards | 136231 | 16569 | -87.8% | **8.22x** CharIteration_tweet_unicodeScalars_Backwards | 222907 | 27165 | -87.8% | **8.21x** CharIteration_korean_unicodeScalars_Backwards | 110132 | 13443 | -87.8% | **8.19x** CharIteration_korean_unicodeScalars | 79540 | 11859 | -85.1% | **6.71x** CharIteration_russian_unicodeScalars | 68209 | 10211 | -85.0% | **6.68x** CharIteration_japanese_unicodeScalars | 98016 | 14690 | -85.0% | **6.67x** CharIteration_tweet_unicodeScalars | 161177 | 24227 | -85.0% | **6.65x** CharIteration_chinese_unicodeScalars | 61702 | 9278 | -85.0% | **6.65x** CharIteration_ascii_unicodeScalars | 81049 | 12218 | -84.9% | **6.63x** </details>
@swift-ci Please smoke benchmark |
@swift-ci Please smoke test and merge |
Build comment file:Optimized (O)Regression (4)
Improvement (4)
No Changes (274)
Added (28)
Unoptimized (Onone)Regression (5)
Improvement (4)
No Changes (273)
Added (28)
Hardware Overview
|
Can we do something to the naming of these tests? They don't follow the convention established so far… and also are horribly long. |
I suggest:
E.g.: |
Or |
@dabrahams I'm looking over these benchmarks and wonder about the motivation for testing |
I'm sorry, I don't think I can offer any real insight. I think the idea was to make sure all the new components were performant and didn't regress. |
Also, I have tried to test locally a modified versions without first copying the |
How realistic is testing iteration over |
@milseman Do you have opinion about this? |
Reversed decoding can be more expensive, and is not optimizable by using a customized iterator.
Sorry, could you re-phrase this? BTW, I introduced a custom iterator for String.UnicodeScalarView, which makes pure iteration much faster than the indexing approach. |
I was questioning the practical applicability (or realism) of how this benchmark is currently built. I can understand the importance of reverse iteration on the character view as that's used for example in the Boyer–Moore string-search algorithm. I guess this is also applicable to the But the Shouldn't these benchmarks be changed to measure forward and reverse iteration and indexing on the |
I think the main problem with how this is built is poor workload selection. We're taking strings whose shapes were meant to be used as perf-regression tests for grapheme-breaking fast-paths, and using them for scalar decoding. Instead, we want these shaped by the properties that can affect scalar decoding. Realistic scalar shapes for UTF-8 encoded storage include: ASCII, <U+0080, <U+0800, and non-BMP scalars. Realistic string shapes include realistic mixtures of scalar shapes, often times with runs of ASCII in the middle. String-search concerns should apply to all views (though UTF-16 may not be important in practice). The UTF8View is useful for parsing, where you're looking for specific ASCII tags in otherwise ignored bytes. The UnicodeScalarView is useful for conformance checking, as no serious specification would be defined post-grapheme-segmentation, but rather on raw scalars. However, the USV may differ in that it might want to match modulo-canonical-equlivaence, that is TBD.
Yes, these workloads are bogus/highly-redundant for profiling views post-grapheme-segmentation.
I think it makes way more sense to measure them on String, yes. BTW, where is that overhead you're seeing coming from? We may have to guarantee-or-trap on a single-grapheme check (and I know there's a bug regarding asserts builds here). edit: grammar |
In local testing, these benchmark results change as follows due to f212fac:
Regression (20)
Improvement (43)