[SR-7602] UTF8 should be (one of) the fastest String encoding(s)


   |                  |                 |
   |------------------|-----------------|
   |Previous ID       | SR-7602      |
   |Radar             | None         |
   |Original Reporter | @weissi      |
   |Type              | Bug    |
   |Status            | Resolved        |
   |Resolution        | Done    |
        
   <details>
  <summary>Additional Detail from JIRA</summary>

   |                  |                 |
   |------------------|-----------------|
   |Votes             | 26         |
   |Component/s       | Standard Library    |
   |Labels            | Bug, AffectsABI        |
   |Assignee          | @milseman      |
   |Priority          | Medium      |

   

   md5: f681e7f0741f98e436f811971add77c3

  </details>


**Sub-Tasks**:  
* [SR-7725](https://bugs.swift.org/browse/SR-7725) [String] New validity model  




**Issue Description:**


I believe that there are really only one (and a half) encodings that matter today: UTF8 (and its subset ASCII).  
Therefore it's important that Swift's fastest String encoding is UTF8.

[From what I can tell ](https://github.com/apple/swift/blob/7e68e8f4a3cb1173e909dc22a3490c05e43fa592/stdlib/public/core/StringObject.swift) today the fastest String encodings are UTF16 and ASCII. Everything else will have worse performance.

This also seems to ABI relevant so AFAIK this needs to be fixed very soon.

Requirements:

1.  being able to copy UTF-8 encoded bytes from a `String` into a pre-allocated raw buffer must be allocation-free and as fast as `memcpy` can copy them

2.  creating a String from UTF-8 encoded bytes should just validate the encoding and store the bytes as they are

3.  slightly softer but still very strong requirement: currently (even with ASCII) only the stdlib seems to be able to get a pointer to the contiguous ASCII representation (if at all in that form). That works fine if you just want to copy the bytes (`UnsafeMutableBufferPointer(start: destinationStart, count: destinationLength).initialize(from: string.utf8)` which will use `memcpy` if in ASCII representation) but doesn't allow you to implement your own algorithms that are only performant on a contiguously stored `[UInt8]`


   

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SR-7602] UTF8 should be (one of) the fastest String encoding(s) #50144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Previous ID	SR-7602
Radar	None
Original Reporter	@weissi
Type	Bug
Status	Resolved
Resolution	Done


Votes	26
Component/s	Standard Library
Labels	Bug, AffectsABI
Assignee	@milseman
Priority	Medium

[SR-7602] UTF8 should be (one of) the fastest String encoding(s) #50144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions