-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Vectorize UTF16->UTF8 transcoding #83073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stdlib/public/core/UTF16.swift
Outdated
| } else { | ||
| isASCII = false | ||
| var tmp: ( | ||
| UInt32, UInt32, UInt32, UInt32, UInt32, UInt32, UInt32, UInt32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a temporary buffer here is sort of awful and I want to improve it at some point, but it's also not really hurting anything and simplifies the rest of the code a lot
|
@swift-ci please test |
|
@swift-ci please benchmark |
|
@swift-ci please Apple Silicon benchmark |
I'll take that. (The |
|
Some of those failures do look real, so this'll stay as a draft for now |
|
Somehow the x86 results look better despite not using the hand vectorized path? I guess I should try using the fallback path on arm64 and see if it does ok there 😂 |
|
@swift-ci please test |
|
@swift-ci please Apple Silicon benchmark |
|
@swift-ci please benchmark |
|
Turns out not accidentally processing twice as much data improves the speedup! |
|
@swift-ci please Apple Silicon benchmark |
|
@swift-ci please benchmark |
|
@swift-ci please Apple Silicon benchmark |
Just as good as before, so I think that means I get to delete all the architecture-specific bits of the patch :) |
… an unnecessary usableFromInline
|
@swift-ci please benchmark |
|
@swift-ci please Apple Silicon benchmark |
…, and special case empty buffers
|
@swift-ci please test |
|
Alas, still not quite there |
|
@swift-ci please test |
|
@swift-ci please Apple Silicon benchmark |
|
woo, Linux tests pass! Unfortunately I did some more detailed benchmarking today, and the current non-ASCII handling is several times slower than Foundation's, so I've got a bit more to do there. |
|
@swift-ci please test |
|
@swift-ci please Apple Silicon benchmark |
|
@swift-ci please benchmark |
| typealias Word = UInt | ||
| #endif | ||
| let mask = Word(truncatingIfNeeded: 0xFF80FF80_FF80FF80 as UInt64) | ||
| @_transparent var mask:Word { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a computed var because I was hitting a really weird thing locally where sometimes doing it as a statically initialized let constant was initializing it to zero. I'll see if I can pin down what's going on and file a compiler bug, but this appears to work.
|
The macOS builder failed with |
|
@swift-ci please smoke test |
| #else | ||
| @_transparent var blockSize:Int { 1 } | ||
| @_transparent | ||
| func allASCIIBlock(at pointer: UnsafePointer<UInt16>) -> CollectionOfOne<UInt8>? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we can use InlineArray in the stdlib, this should be easier to convert into a SWAR-style thing that does 8 elements at a time.
…if we need to. I don't think anyone is really depending on UTF16 transcoding perf in an environment that can't use SIMD, but hey, it's a big world, who knows.
|
Woo, macOS builder passed. I’m on my phone so haven’t checked the logs for the others yet. Fingers crossed for unrelated issues. |
|
@swift-ci please smoke test |
|
#83407 is an experiment with an additional optimization on top of this. Currently it seems to make -Onone perf completely unusable, so I'll probably save it for later. [EDIT] Actually that's pre-existing slowness apparently! So maybe I'll go with it after all. |
|
Benchmark results from the other PR: That corresponds to a big speedup for length calculations for low-BMP characters, a small speedup for ASCII, and a small slowdown for astral/high-BMP characters. |
|
Tests just passed in #83407, so closing this in favor of it |
|
Not going to delete this quite yet though because despite being measurably faster the codegen for the other one is REALLY odd |





Fixes rdar://141789595