Skip to content

[SR-7602] UTF8 should be (one of) the fastest String encoding(s) #50144

@weissi

Description

@weissi
Previous ID SR-7602
Radar None
Original Reporter @weissi
Type Bug
Status Resolved
Resolution Done
Additional Detail from JIRA
Votes 26
Component/s Standard Library
Labels Bug, AffectsABI
Assignee @milseman
Priority Medium

md5: f681e7f0741f98e436f811971add77c3

Sub-Tasks:

  • SR-7725 [String] New validity model

Issue Description:

I believe that there are really only one (and a half) encodings that matter today: UTF8 (and its subset ASCII).
Therefore it's important that Swift's fastest String encoding is UTF8.

From what I can tell today the fastest String encodings are UTF16 and ASCII. Everything else will have worse performance.

This also seems to ABI relevant so AFAIK this needs to be fixed very soon.

Requirements:

  1. being able to copy UTF-8 encoded bytes from a String into a pre-allocated raw buffer must be allocation-free and as fast as memcpy can copy them

  2. creating a String from UTF-8 encoded bytes should just validate the encoding and store the bytes as they are

  3. slightly softer but still very strong requirement: currently (even with ASCII) only the stdlib seems to be able to get a pointer to the contiguous ASCII representation (if at all in that form). That works fine if you just want to copy the bytes (UnsafeMutableBufferPointer(start: destinationStart, count: destinationLength).initialize(from: string.utf8) which will use memcpy if in ASCII representation) but doesn't allow you to implement your own algorithms that are only performant on a contiguously stored [UInt8]

Metadata

Metadata

Assignees

Labels

affects ABIFlag: Affects ABIbugA deviation from expected or documented behavior. Also: expected but undesirable behavior.standard libraryArea: Standard library umbrella

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions