Vectorize UTF16->UTF8 transcoding #83073

Catfish-Man · 2025-07-15T20:36:16Z

Fixes rdar://141789595

stdlib/public/core/UTF16.swift

Catfish-Man · 2025-07-15T20:40:00Z

stdlib/public/core/UTF16.swift

+    } else {
+      isASCII = false
+      var tmp: (
+        UInt32, UInt32, UInt32, UInt32, UInt32, UInt32, UInt32, UInt32


Making a temporary buffer here is sort of awful and I want to improve it at some point, but it's also not really hurting anything and simplifies the rest of the code a lot

Catfish-Man · 2025-07-15T22:01:18Z

@swift-ci please test

Catfish-Man · 2025-07-15T22:01:23Z

@swift-ci please benchmark

Catfish-Man · 2025-07-15T22:01:29Z

@swift-ci please Apple Silicon benchmark

Catfish-Man · 2025-07-15T22:57:36Z

------- Performance (arm64): -Osize -------
REGRESSION                               OLD        NEW        DELTA       RATIO    
NSString.bridged.byteCount.ascii.utf8    0.0        0.339      +33900.0%   **0.00x (?)**
Calculator                               115.5      128.056    +10.9%      **0.90x**
Chars2                                   2731.25    2982.432   +9.2%       **0.92x**

IMPROVEMENT                              OLD        NEW        DELTA       RATIO    
UTF16Decode.initDecoding                 69.2       4.077      -94.1%      **16.97x**
UTF16Decode.initFromCustom.cont          251.375    21.8       -91.3%      **11.53x**
ArrayAppendGenericStructs                1134.545   870.0      -23.3%      **1.30x (?)**
Array.removeAll.keepingCapacity.Object   2.522      2.238      -11.3%      **1.13x (?)**
InsertCharacterEndIndex                  58.865     54.923     -6.7%       **1.07x**

I'll take that. (The NSString.bridged.byteCount.ascii.utf8 result is noise due to it running too fast after earlier speedups)

Catfish-Man · 2025-07-15T23:26:43Z

Some of those failures do look real, so this'll stay as a draft for now

Catfish-Man · 2025-07-16T00:09:55Z

Somehow the x86 results look better despite not using the hand vectorized path? I guess I should try using the fallback path on arm64 and see if it does ok there 😂

 IMPROVEMENT                                   OLD         NEW         DELTA    RATIO    
17:07:59  UTF16Decode.initDecoding                      176.167     6.55        -96.3%   **26.89x**
17:07:59  UTF16Decode.initFromCustom.cont               475.5       37.615      -92.1%   **12.64x**
17:07:59  Breadcrumbs.CopyAllUTF16CodeUnits.longMixed   223.364     160.133     -28.3%   **1.39x**
17:07:59  Breadcrumbs.CopyAllUTF16CodeUnits.Mixed       226.2       162.733     -28.1%   **1.39x**
17:07:59  Breadcrumbs.CopyUTF16CodeUnits.longMixed      229.6       165.643     -27.9%   **1.39x**

Catfish-Man · 2025-07-16T22:12:11Z

@swift-ci please test

Catfish-Man · 2025-07-16T22:12:20Z

@swift-ci please Apple Silicon benchmark

Catfish-Man · 2025-07-16T22:12:26Z

@swift-ci please benchmark

Catfish-Man · 2025-07-16T23:04:41Z

Turns out not accidentally processing twice as much data improves the speedup!

------- Performance (arm64): -Osize -------

REGRESSION                        OLD       NEW       DELTA    RATIO    
MapReduceClass2                   59.048    64.658    +9.5%    **0.91x**
MapReduceClassShort2              91.654    100.0     +9.1%    **0.92x (?)**

IMPROVEMENT                       OLD       NEW       DELTA    RATIO    
UTF16Decode.initDecoding          72.619    2.239     -96.9%   **32.42x**
UTF16Decode.initFromCustom.cont   252.375   22.0      -91.3%   **11.47x**
BufferFillFromSlice               11.326    10.068    -11.1%   **1.12x (?)**
ArrayAppendToGeneric              179.5     165.496   -7.8%    **1.08x (?)**
String.replaceSubrange.String     6.076     5.615     -7.6%    **1.08x**
InsertCharacterEndIndex           60.472    56.405    -6.7%    **1.07x**

Catfish-Man · 2025-07-19T06:50:41Z

@swift-ci please Apple Silicon benchmark

Catfish-Man · 2025-07-19T06:51:01Z

@swift-ci please benchmark

Catfish-Man · 2025-07-19T19:24:44Z

@swift-ci please Apple Silicon benchmark

Catfish-Man · 2025-07-19T20:16:30Z

IMPROVEMENT                                  OLD         NEW         DELTA       RATIO    
UTF16Decode.initDecoding                     69.524      2.135       -96.9%      **32.55x**
UTF16Decode.initFromCustom.cont              252.0       21.648      -91.4%      **11.64x**
Calculator                                   128.056     115.55      -9.8%       **1.11x**
StringHasPrefixUnicode                       24014.085   21890.411   -8.8%       **1.10x**

Just as good as before, so I think that means I get to delete all the architecture-specific bits of the patch :)

… an unnecessary usableFromInline

Catfish-Man · 2025-07-20T09:31:11Z

@swift-ci please benchmark

Catfish-Man · 2025-07-20T09:31:16Z

@swift-ci please Apple Silicon benchmark

…, and special case empty buffers

Catfish-Man · 2025-07-21T19:14:39Z

@swift-ci please test

Catfish-Man · 2025-07-21T20:26:38Z

Alas, still not quite there

Catfish-Man · 2025-07-21T20:59:58Z

@swift-ci please test

Catfish-Man · 2025-07-21T21:00:04Z

@swift-ci please Apple Silicon benchmark

Catfish-Man · 2025-07-22T00:37:43Z

woo, Linux tests pass! Unfortunately I did some more detailed benchmarking today, and the current non-ASCII handling is several times slower than Foundation's, so I've got a bit more to do there.

…n-ascii

Catfish-Man · 2025-07-22T09:06:29Z

Current local benchmark results comparing to Foundation's C implementation. Don't put too much weight on the very small counts, the CF bridging check + cross-module call probably accounts for the difference there.

In short: comparing against the C implementation we trade a small regression on non-ASCII for a large progression on runs of ASCII and a small progression on alternating ASCII/non-ASCII.

Catfish-Man · 2025-07-22T09:06:59Z

@swift-ci please test

Catfish-Man · 2025-07-22T09:07:06Z

@swift-ci please Apple Silicon benchmark

Catfish-Man · 2025-07-22T09:07:11Z

@swift-ci please benchmark

Catfish-Man · 2025-07-22T09:37:37Z

stdlib/public/core/UTF16.swift

 typealias Word = UInt
 #endif
-let mask = Word(truncatingIfNeeded: 0xFF80FF80_FF80FF80 as UInt64)
+@_transparent var mask:Word {


This is a computed var because I was hitting a really weird thing locally where sometimes doing it as a statically initialized let constant was initializing it to zero. I'll see if I can pin down what's going on and file a compiler bug, but this appears to work.

Catfish-Man · 2025-07-22T17:39:34Z

The macOS builder failed with error: cannot find type 'SIMD16' in scope… that's interesting. Guess I'll need to add the vector types enabled guard after all, how annoying.

Catfish-Man · 2025-07-28T21:45:41Z

@swift-ci please smoke test

Catfish-Man · 2025-07-28T22:55:18Z

stdlib/public/core/UTF16.swift

+#else
+@_transparent var blockSize:Int { 1 }
+@_transparent
+func allASCIIBlock(at pointer: UnsafePointer<UInt16>) -> CollectionOfOne<UInt8>? {


Once we can use InlineArray in the stdlib, this should be easier to convert into a SWAR-style thing that does 8 elements at a time.

…if we need to. I don't think anyone is really depending on UTF16 transcoding perf in an environment that can't use SIMD, but hey, it's a big world, who knows.

Catfish-Man · 2025-07-29T03:36:15Z

Woo, macOS builder passed. I’m on my phone so haven’t checked the logs for the others yet. Fingers crossed for unrelated issues.

Catfish-Man · 2025-07-29T17:30:23Z

@swift-ci please smoke test

Catfish-Man · 2025-07-29T22:54:37Z

#83407 is an experiment with an additional optimization on top of this. Currently it seems to make -Onone perf completely unusable, so I'll probably save it for later.

[EDIT] Actually that's pre-existing slowness apparently! So maybe I'll go with it after all.

Catfish-Man · 2025-07-29T23:30:00Z

Benchmark results from the other PR:

------- Performance (arm64): -Osize -------
 REGRESSION                        OLD        NEW        DELTA    RATIO  
 StringHasPrefixAscii              1420.625   1596.429   +12.4%   **0.89x**
 Calculator                        115.5      128.056    +10.9%   **0.90x**
 Chars2                            2731.25    2981.579   +9.2%    **0.92x**
 
 IMPROVEMENT                       OLD        NEW        DELTA    RATIO  
 UTF16Decode.initDecoding          71.19      1.819      -97.4%   **39.12x**
 UTF16Decode.initFromCustom.cont   251.625    18.222     -92.8%   **13.81x**

That corresponds to a big speedup for length calculations for low-BMP characters, a small speedup for ASCII, and a small slowdown for astral/high-BMP characters.

Catfish-Man · 2025-07-29T23:57:27Z

Comparison between String(decoding:as:UTF16.self) and a simulated version of that using the new implementation

Catfish-Man · 2025-07-30T18:08:14Z

Tests just passed in #83407, so closing this in favor of it

Catfish-Man · 2025-07-30T18:59:22Z

Not going to delete this quite yet though because despite being measurably faster the codegen for the other one is REALLY odd

WIP vectorization for UTF16->UTF8

6ae47a3

Catfish-Man self-assigned this Jul 15, 2025

Catfish-Man commented Jul 15, 2025

View reviewed changes

stdlib/public/core/UTF16.swift Outdated Show resolved Hide resolved

Catfish-Man commented Jul 15, 2025

View reviewed changes

Lots of fixes

246939c

Fun fact: UInt16 is not the same size as UInt8

f85efe5

See if the scalar version autovectorizes on arm64 too

4b84ced

Build fix for experiment

25ac970

Catfish-Man added 3 commits July 19, 2025 13:18

Remove arm64-specific code

931ae62

Adjust for 32 bit

8e9f5e0

Stop doing size math, stop duplicating work in some cases, and delete…

f0cee25

… an unnecessary usableFromInline

Catfish-Man added 2 commits July 20, 2025 03:23

Adopt the new implementation in another place, add unsafe annotations…

4b9be8f

…, and special case empty buffers

Actually detect non-ascii in the fallback path

f326f61

Remove pointless failed attempt at being clever

9263ce6

Do it all by hand, since empirically it's a lot faster for runs of no…

3177957

…n-ascii

Catfish-Man changed the title ~~WIP vectorization for UTF16->UTF8~~ Vectorize UTF16->UTF8 transcoding Jul 22, 2025

Catfish-Man marked this pull request as ready for review July 22, 2025 09:23

Catfish-Man requested a review from a team as a code owner July 22, 2025 09:23

Catfish-Man commented Jul 22, 2025

View reviewed changes

Catfish-Man added 2 commits July 28, 2025 13:38

Merge branch 'main' into asciivec

9d6d225

Add a (slow) scalar fallback path, and add more unsafe annotations

9dc0c96

Catfish-Man commented Jul 28, 2025

View reviewed changes

Fix precondition

bb2437d

Catfish-Man mentioned this pull request Jul 29, 2025

Vectorize UTF16->UTF8 transcoding #83407

Open

Catfish-Man requested a review from stephentyrone July 29, 2025 23:59

Catfish-Man closed this Jul 30, 2025

Vectorize UTF16->UTF8 transcoding #83073

Vectorize UTF16->UTF8 transcoding #83073

Uh oh!

Conversation

Catfish-Man commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Catfish-Man Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Catfish-Man commented Jul 15, 2025

Uh oh!

Catfish-Man commented Jul 15, 2025

Uh oh!

Catfish-Man commented Jul 15, 2025

Uh oh!

Catfish-Man commented Jul 15, 2025

Uh oh!

Catfish-Man commented Jul 15, 2025

Uh oh!

Catfish-Man commented Jul 16, 2025

Uh oh!

Catfish-Man commented Jul 16, 2025

Uh oh!

Catfish-Man commented Jul 16, 2025

Uh oh!

Catfish-Man commented Jul 16, 2025

Uh oh!

Catfish-Man commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Catfish-Man commented Jul 19, 2025

Uh oh!

Catfish-Man commented Jul 19, 2025

Uh oh!

Catfish-Man commented Jul 19, 2025

Uh oh!

Catfish-Man commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Catfish-Man commented Jul 20, 2025

Uh oh!

Catfish-Man commented Jul 20, 2025

Uh oh!

Catfish-Man commented Jul 21, 2025

Uh oh!

Catfish-Man commented Jul 21, 2025

Uh oh!

Catfish-Man commented Jul 21, 2025

Uh oh!

Catfish-Man commented Jul 21, 2025

Uh oh!

Catfish-Man commented Jul 22, 2025

Uh oh!

Catfish-Man commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Catfish-Man commented Jul 22, 2025

Uh oh!

Catfish-Man commented Jul 22, 2025

Uh oh!

Catfish-Man commented Jul 22, 2025

Uh oh!

Catfish-Man Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Catfish-Man commented Jul 22, 2025

Uh oh!

Catfish-Man commented Jul 28, 2025

Uh oh!

Catfish-Man Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Catfish-Man commented Jul 29, 2025

Uh oh!

Catfish-Man commented Jul 29, 2025

Uh oh!

Catfish-Man commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Catfish-Man commented Jul 15, 2025 •

edited

Loading

Catfish-Man commented Jul 16, 2025 •

edited

Loading

Catfish-Man commented Jul 19, 2025 •

edited

Loading

Catfish-Man commented Jul 22, 2025 •

edited

Loading

Catfish-Man commented Jul 29, 2025 •

edited

Loading