Skip to content

Conversation

Catfish-Man
Copy link
Contributor

@Catfish-Man Catfish-Man commented Aug 4, 2025

This removes a bunch of overhead on the UTF16 paths in String, as well as consolidating the complicated bits of the logic in one file.

Fixes rdar://157500258

@Catfish-Man Catfish-Man self-assigned this Aug 4, 2025
@Catfish-Man Catfish-Man requested a review from a team as a code owner August 4, 2025 20:14
@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

1 similar comment
@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

That's very weird. The benchmark run consistently dies at -Onone only, the optimized ones pass.


@inline(__always)
final internal var utf16: String.UTF16View {
String.UTF16View(_StringGuts(self))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically this should all inline out, but this skips going through the String initializer

_uncheckedBounds: (aRange.location, aRange.location+aRange.length))
let str = asString
unsafe str._copyUTF16CodeUnits(
unsafe utf16._nativeCopy(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the theme of "move all the smarts into StringUTF16View.swift instead of having them scattered in 3-4 different files"

"String index is out of bounds")
return unsafe UInt16((start + offset).pointee)
} else {
return utf16[nativeNonASCIIOffset: offset]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the theme of "move all the smarts into StringUTF16View.swift instead of having them scattered in 3-4 different files"

return _foreignSubscript(position: idx)
}

internal subscript(nativeNonASCIIOffset offset: Int) -> UTF16.CodeUnit {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much just pulling the relevant bits out of the code it used to go through

_precondition(alignedRange.lowerBound._encodedOffset < _guts.count &&
alignedRange.upperBound._encodedOffset < _guts.count,
"String index is out of bounds")
unsafe _nativeCopy(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do a fast path in this for blocks of ascii (already have one for the entire thing being known-ascii), but I was having trouble getting it to generate reasonable looking code

}

@inlinable
@inlinable @inline(__always)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was actually generating a retain-release around this before?? ~18% overhead in my simple test

unsafe d.initialize(from: s, count: n)
return (unsafe Iterator(_position: s + n, _end: s + count), n)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the remaining changes are kinda unrelated, but I was getting frustrated with trivial pointer operations generating calls

@Catfish-Man
Copy link
Contributor Author

Local -characterAtIndex: benchmark gives

ASCII: ~2s -> ~0.63s
Non-ASCII (different iteration count): ~2.5s -> ~1.95s

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

For a shorter (3 character) non-ASCII string we go from ~2s to ~1.25s, which indicates that most of the win is removing constant overhead, as expected.

@Catfish-Man
Copy link
Contributor Author

lol for ASCII we're now very slightly faster than NSCFString

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please benchmark

@Catfish-Man
Copy link
Contributor Author

I should look into what's up with StringWalk, but overall this is looking pretty decent

19:20:02  ------- Performance (arm64): -Osize -------
19:20:02  
19:20:02  REGRESSION                                                  OLD        NEW         DELTA    RATIO      
19:20:02  StringWalk                                                  984.5      1931.556    +96.2%   **0.51x**  
19:20:02  DataReplaceLarge                                            6496.97    8342.308    +28.4%   **0.78x (?)**
19:20:02  DataAppendDataLargeToMedium                                 6171.429   7908.333    +28.1%   **0.78x (?)**
19:20:02  DataAppendDataLargeToSmall                                  6421.875   8173.913    +27.3%   **0.79x (?)**
19:20:02  DataAppendDataSmallToLarge                                  5588.732   6734.483    +20.5%   **0.83x (?)**
19:20:02  CharIteration_ascii_unicodeScalars_Backwards                2838.75    3395.385    +19.6%   **0.84x**  
19:20:02  CharIteration_tweet_unicodeScalars_Backwards                5672.5     6760.0      +19.2%   **0.84x**  
19:20:02  CharIteration_chinese_unicodeScalars_Backwards              3616.0     4163.636    +15.1%   **0.87x**  
19:20:02  CharIndexing_punctuated_unicodeScalars                      565.042    648.846     +14.8%   **0.87x**  
19:20:02  CharIteration_punctuated_unicodeScalars_Backwards           868.764    985.128     +13.4%   **0.88x**  
19:20:02  CharIteration_utf16_unicodeScalars_Backwards                6304.615   7086.667    +12.4%   **0.89x**  
19:20:02  StrComplexWalk                                              1956.667   2172.0      +11.0%   **0.90x**  
19:20:02  CharIteration_japanese_unicodeScalars_Backwards             6042.667   6700.0      +10.9%   **0.90x**  
19:20:02  CharIteration_korean_unicodeScalars_Backwards               5428.235   6016.0      +10.8%   **0.90x**  
19:20:02  CharIteration_punctuatedJapanese_unicodeScalars_Backwards   900.0      993.803     +10.4%   **0.91x**  
19:20:02  CharIteration_russian_unicodeScalars_Backwards              4006.957   4422.0      +10.4%   **0.91x**  
19:20:02  StringHasPrefixAscii                                        1421.25    1543.333    +8.6%    **0.92x**  
19:20:02  AngryPhonebook.Cyrillic                                     306.75     333.0       +8.6%    **0.92x**  
19:20:02  StringHasSuffixAscii                                        1226.842   1329.412    +8.4%    **0.92x**  
19:20:02  
19:20:02  IMPROVEMENT                                                 OLD        NEW         DELTA    RATIO      
19:20:02  NSString.bridged.byteCount.ascii.utf8                       0.313      0.0         -99.7%   **314.00x (?)**
19:20:02  CharIteration_tweet_unicodeScalars                          5225.882   2661.25     -49.1%   **1.96x**  
19:20:02  CharIteration_ascii_unicodeScalars                          2643.636   1347.797    -49.0%   **1.96x**  
19:20:02  CharIteration_punctuated_unicodeScalars                     670.588    388.654     -42.0%   **1.73x**  
19:20:02  CharacterPropertiesPrecomputed                              459.302    314.074     -31.6%   **1.46x**  
19:20:02  KeyPathOptionals                                            92.28      65.353      -29.2%   **1.41x**  
19:20:02  KeyPathNestedClasses                                        38.839     28.488      -26.7%   **1.36x**  
19:20:02  ObjectiveCBridgeStubToNSStringRef                           88.52      66.156      -25.3%   **1.34x (?)**
19:20:02  CharIteration_korean_unicodeScalars                         3409.231   2562.5      -24.8%   **1.33x**  
19:20:02  CharIteration_chinese_unicodeScalars                        2291.892   1809.565    -21.0%   **1.27x**  
19:20:02  SubstringEquatable                                          199.364    157.429     -21.0%   **1.27x**  
19:20:02  Breadcrumbs.CopyAllUTF16CodeUnits.longMixed                 142.118    113.476     -20.2%   **1.25x**  
19:20:02  Breadcrumbs.CopyAllUTF16CodeUnits.Mixed                     143.125    114.4       -20.1%   **1.25x**  
19:20:02  Breadcrumbs.CopyUTF16CodeUnits.longMixed                    146.063    117.474     -19.6%   **1.24x**  
19:20:02  KeyPathClassStructs                                         75.857     62.043      -18.2%   **1.22x**  
19:20:02  CharIteration_russian_unicodeScalars                        2816.774   2317.838    -17.7%   **1.22x**  
19:20:02  SubstringEqualString                                        112.05     95.348      -14.9%   **1.18x**  
19:20:02  SuperChars2                                                 131.471    112.632     -14.3%   **1.17x**  
19:20:02  CharIteration_japanese_unicodeScalars                       3695.0     3179.259    -14.0%   **1.16x**  
19:20:02  SortStringsUnicode                                          1330.625   1157.778    -13.0%   **1.15x**  
19:20:02  Set.isDisjoint.Seq.Box.Empty                                45.34      39.946      -11.9%   **1.14x**  
19:20:02  StringComparison_latin1                                     228.6      202.087     -11.6%   **1.13x**  
19:20:02  ObjectiveCBridgeStringHash                                  41.625     36.846      -11.5%   **1.13x**  
19:20:02  StringDistance.utf16.mixed                                  15.268     13.571      -11.1%   **1.13x**  
19:20:02  StringComparison_emoji                                      168.37     151.322     -10.1%   **1.11x**  
19:20:02  StringComparison_nonBMPSlowestPrenormal                     312.933    281.375     -10.1%   **1.11x**  
19:20:02  StringHasPrefixUnicode                                      22600.0    20346.154   -10.0%   **1.11x**  
19:20:02  StringComparison_slowerPrenormal                            353.548    318.507     -9.9%    **1.11x**  
19:20:02  Calculator                                                  128.056    115.5       -9.8%    **1.11x**  
19:20:02  ObjectiveCBridgeStubToNSString                              714.333    657.353     -8.0%    **1.09x (?)**
19:20:02  StringComparison_fastPrenormal                              283.614    261.86      -7.7%    **1.08x**  
19:20:02  SubstringTrimmingASCIIWhitespace                            73.536     67.9        -7.7%    **1.08x**  
19:20:02  FlattenListLoop                                             1678.0     1551.0      -7.6%    **1.08x (?)**
19:20:02  FindString.Loop1.Substring                                  114.19     105.652     -7.5%    **1.08x**  
19:20:02  StringInterpolationSmall                                    521.19     482.791     -7.4%    **1.08x**  
19:20:02  
19:20:02  ------- Code size: -Osize -------
19:20:02  
19:20:02  REGRESSION              OLD     NEW     DELTA   RATIO  
19:20:02  StringInterpolation.o   5332    5596    +5.0%   **0.95x**
19:20:02  StringSplitting.o       22753   23777   +4.5%   **0.96x**
19:20:02  Hash.o                  12448   12708   +2.1%   **0.98x**
19:20:02  UTF16Decode.o           13842   14086   +1.8%   **0.98x**
19:20:02  StrComplexWalk.o        2198    2226    +1.3%   **0.99x**
19:20:02  CharacterProperties.o   15232   15392   +1.1%   **0.99x**
19:20:02  
19:20:02  IMPROVEMENT             OLD     NEW     DELTA   RATIO  
19:20:02  CSVParsing.o            34204   33728   -1.4%   **1.01x**

@Catfish-Man
Copy link
Contributor Author

The Linux failure is funny. The expected error is moved to compile time so the test fails to build

025-09-12T02:28:31.177Z] /home/build-user/swift/validation-test/stdlib/UnsafeBufferPointer.swift.gyb:291:16: error: assumed non-negative value '-1' is negative
[2025-09-12T02:28:31.177Z]  594 |   defer { emptyAllocated.deallocate() }
[2025-09-12T02:28:31.177Z]  595 | 
[2025-09-12T02:28:31.177Z]  596 |   let buffer = UnsafeRawBufferPointer(start: UnsafeRawPointer(emptyAllocated), count: -1)
[2025-09-12T02:28:31.177Z]      |                `- error: assumed non-negative value '-1' is negative
[2025-09-12T02:28:31.177Z]  597 |   _ = buffer
[2025-09-12T02:28:31.177Z]  598 | }

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

let emptyAllocated = UnsafeMutablePointer<Float>.allocate(capacity: 0)
defer { emptyAllocated.deallocate() }

// Make sure we can't emit the error at compile time
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder… does converting a runtime trap into an error like this count as a source break?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have converted runtime traps to compilation errors before, without considering them source breaks. What it amounts to is speeding up the discovery of an incorrect program to compilation time.

@Catfish-Man
Copy link
Contributor Author

StringWalk appears to be extremely sensitive to minor inlining differences. I've filed a followup bug for the optimizer folks about a missed optimization that occurs both before and after, and I'm hoping will more than recover the difference.

@Catfish-Man Catfish-Man merged commit 7b78a1d into swiftlang:main Sep 23, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants