Skip to content

Conversation

@Catfish-Man
Copy link
Contributor

@Catfish-Man Catfish-Man commented Jan 25, 2025

Fixes rdar://144114867

@Catfish-Man Catfish-Man self-assigned this Jan 25, 2025
@Catfish-Man
Copy link
Contributor Author

@swift-ci please smoke test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

) -> (String, repairsMade: Bool)?
where Input.Element == Encoding.CodeUnit {
// TODO(String Performance): Attempt to form smol strings
if input.count < _SmallString.capacity {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"14 or fewer" is operating under the assumption that "all ASCII except for one" is a relatively common case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think early-exiting for small should be done in the single transcoding loop. We do not know how many UTF-8 code units input.count is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. This check is a heuristic to avoid even attempting a small string, not the actual check to confirm that we can. We know that 15 UTF16 code units is very unlikely to succeed because if they were all ASCII then we'd probably have an ASCII buffer to start with, and if they're not all ASCII then we'll have >15 UTF8, so 14 is the largest number that could possible work, unless I'm missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is this algorithm proceeds as follows:

  1. We check a might-be-small heuristic (which could be either an overestimate or underestimate)
  2. If heuristic passes, we run a transcode loop to try to form and return the small string if we can, otherwise:
  3. We run a second transcode loop into an Array<UInt8>
  4. We copy that array's contents into a String, either making a _SmallString or __StringStorage, depending on its length

I'm wondering if we can simplify the code and logic, as well as improve the general case, by changing the existing transcode loop to skip the Array if it's going to be returning a small string anyways.

Would it be possible to just decode the input's scalars and append them to a var result = "" (or its guts)?. That would avoid there ever being an array and keep it in small form for every input that could be small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's an interesting approach. I wonder if resizing (i.e. multiple mallocs) is actually faster than two transcode passes (one throwing away the data but writing down the size, then one actually copying in) in common cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. UnicodeScalarView has an append(_: UnicodeScalar), so that seems viable. The implementation is naïve, though; we would need to improve it substantially. (Or we can add something better.)

Copy link
Contributor

@iThinker iThinker Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious about this and tried to dig it a bit.
What is the agreement on ideal flow?
I tried to look at the suggestions above, but there are some hurdles.

  1. The basic issue is that we don't know if result can be smol until transcoding is done.
  2. Starting with "" and resizing it on the go seems inefficient. It will be re-creating _SmallString on each append.
  3. IIUC one of the big desires is to remove the intermediate array. I was thinking maybe __StringStorage could be a starting point. It is updated as part of transcoding. In the end it is converted to _SmallString if it fits the criteria.

EDIT: Looks like __StringStorage itself is not meant for growing. And dealing with buffer pointers directly they really want to know the size upfront. The only alternative to doing 2 transcoding passes I can think of is allocating 2x memory (worst case scenario) and then shrinking it. It probably does not help with this PR optimistic scenario, but it gets rid of intermediate Array<UInt8>.

Copy link
Contributor

@glessard glessard Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-creating a _SmallString for a few appends is fine, it's two (or three) words held in registers, with a few bytes changed every time. For the ultimate version of this, we need to improve reserveCapacity() and add an internal way to append a known-good UTF8-encoded code point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant entry points for appending to Strings are mostly in StringRangeReplaceableCollection.swift

@_specialize(
where Input == Array<UInt8>, Encoding == Unicode.ASCII)
@_specialize(
where Input == UnsafeBufferPointer<UInt16>, Encoding == UTF16)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about these either

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a separate point, we might want a separate fully concrete code path for UTF-16 anyways, rather than going through generic transcoding.

@Catfish-Man
Copy link
Contributor Author


------- Performance (arm64): -Osize -------

REGRESSION                                 OLD        NEW        DELTA     RATIO     
Calculator                                 115.5      128.056    +10.9%    **0.90x** 
Chars2                                     2731.25    2982.051   +9.2%     **0.92x (?)**
KeyPathWritePerformance                    20.375     22.204     +9.0%     **0.92x** 
String.initRepeating.1AsciiChar.Count100   210.818    228.2      +8.2%     **0.92x** 
StringFromLongWholeSubstring               1.854      1.998      +7.8%     **0.93x** 
DictionaryBridgeToObjC_Bridge              5.782      6.228      +7.7%     **0.93x (?)**

IMPROVEMENT                                OLD        NEW        DELTA     RATIO     
DataToStringEmpty                          361.607    0.0        -100.0%   **361608.00x**
UTF16Decode.initDecoding                   67.737     2.584      -96.2%    **26.20x**
StringWithCString2                         0.001      0.0        -50.0%    **2.00x (?)**
RawBufferCopyBytes                         11.171     9.564      -14.4%    **1.17x (?)**
CSVParsing.Scalar                          160.143    142.125    -11.3%    **1.13x** 
ArrayAppendGenericStructs                  1077.273   957.692    -11.1%    **1.12x (?)**
LineSink.scalars.alpha                     34.479     31.653     -8.2%     **1.09x** 
CxxStringConversion.cxx.to.swift           60.0       56.0       -6.7%     **1.07x (?)**

------- Code size: -Osize -------

REGRESSION          OLD     NEW     DELTA   RATIO  
StringSplitting.o   23441   24089   +2.8%   **0.97x**
UTF8Decode.o        13600   13824   +1.6%   **0.98x**

@Catfish-Man Catfish-Man marked this pull request as ready for review January 30, 2025 20:05
@Catfish-Man Catfish-Man requested a review from a team as a code owner January 30, 2025 20:05
@Catfish-Man
Copy link
Contributor Author

Closing this in favor of #83407, which adds a vectorized transcoder for this as well as making small strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants