Skip to content
This repository has been archived by the owner on Nov 23, 2021. It is now read-only.

Perf: Use string.utf8 field for conversion to data #20

Closed
wants to merge 3 commits into from

Conversation

seabaylea
Copy link
Member

The implementation of String to Data conversion in HTTPStreamingParser.swift uses:

data = string.data(using: .utf8)

which does encoding based on the passed in encoding type. If we're only using utf8 encoding, String already contains a utf8 field that contains the UTF8 encoded version, so its more performant to use that directly (up to 2x faster on a concurrent microbench). This PR converts the usage to:

data = Data(string.utf8)

When we add support for requested encoding, we should keep this as a fast path and use String.data(using:) for the slow path.

@seabaylea seabaylea requested a review from carlbrown July 11, 2017 11:52
@helje5
Copy link
Contributor

helje5 commented Jul 11, 2017

(I was confused first, you say HTTPStreamingParser.swift (which shouldn't do String to Data conversion) when you mean HTTPResponseWriter)

@helje5
Copy link
Contributor

helje5 commented Jul 11, 2017

I have the feeling that string.data(using:) - even if it is slower now - should potentially be faster in the long run. string.data(using:) has knowledge about the internal storage of the String (e.g. if it is a UTF-8 Data backed string, it could just return that data for 0-copy behaviour. Or at least use memcpy). In contrast Data(Sequence<UInt8>) always needs to work on the source as a generic object.

Summary: If Data(string.utf8) is faster than string.data(using: .utf8), I think it is worth filing a performance-bug against Foundation :-> It should be the equal or faster.

P.S.: This can turn into another longish discussion, but technically HTTP/1.1 header encoding is ISO-8859-1, not UTF-8. Another reason why I would recommend to keep known headers as structured values, not strings (and transfer unknown ones as octet values instead of Strings). (Having said that, using UTF-8 is common ignorance^D^D^D^D^D practice though, so we may get away w/ that.)

@seabaylea
Copy link
Member Author

I agree that string.data(using:) has scope to be more performant because its not relying on a Data public constructor API so can construct the returned Data however it likes.

Using Data(string.utf8) pretty much maps down to an allocate and memmove, so the only scope for it to be faster would be if there was zero copy / COW. The existing COW implementation for String cares about whether the String is uniquely referenced, not the backing storage, so to be able to map the storage into two different types would mean a refactoring separating out the storage more obviously. The net is, I don't think string.data(using:) is going to outperform Data(string.utf8), and certainly not in the near future.

You're right about raising a bug against Foundation - Phillipe Hausler is aware, and I'lll raise it with Ben Cohen as well. The string.data(using:) approach gets an encoder from CoreFoundation (under a lock) and actually performance encoding on the storage, so it may be we can fast path that for UTF8, or it may be that the encoding it an intentional part of the "API contract".

My normal approach would be that, as this is internal implementation that's not exposed, I'd take the performance improvement on the table, and then improve again in the future if there's scope to do so.

@helje5
Copy link
Contributor

helje5 commented Jul 12, 2017

My normal approach would be that, as this is internal implementation that's not exposed, I'd take the performance improvement on the table

Yes, sure. I guess my thinking was that it makes more sense to just add this to Foundation

func data(using encoding:...) {
  if encoding == .utf8 { return Data(self.utf8) }
  ... old code ...
}

... instead of doing a local workaround, as this benefits all String users.

P.S.: In the end yours is a micro optimisation which should be kinda dwarfed by the overhead in keeping the headers as regular strings ... but I like such stuff nevertheless ;-)

@phausler
Copy link

Just for the record my benchmarks actually show different behavior than what you were seeing. However there is actually an even more interesting optimization we can do that gets a 10x or more win that I posted as a sample in the swift bug.

This is a very reduced case (without the ability of allowing lossy conversion) https://swift.sandbox.bluemix.net/#/repl/596656240524853a4527b2cb

It's needed after PR swift-server#55 and I want to see if editing this will trigger a new CI build.
@carlbrown
Copy link
Contributor

OK, so my quick attempt to fix the merge conflict failed.

I'll pull it into Xcode and rework it.

@carlbrown carlbrown self-assigned this Sep 27, 2017
@seabaylea
Copy link
Member Author

@carlbrown I'm not sure its worth the effort - work is being done to improve this in Foundation itself, so at the moment this is kind of a placeholder/reminder to track that. It should probably be an Issue...

@carlbrown
Copy link
Contributor

Converted to issue #97

@carlbrown carlbrown closed this Nov 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants