Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: bytes: Introduce a FindFirstMultiByteChar API #34375

Open
alex opened this issue Sep 18, 2019 · 1 comment
Open

proposal: bytes: Introduce a FindFirstMultiByteChar API #34375

alex opened this issue Sep 18, 2019 · 1 comment
Labels
Projects
Milestone

Comments

@alex
Copy link
Contributor

@alex alex commented Sep 18, 2019

A relatively common operation in code that's trying to be high-performance when dealing with utf8 strings is to contain an optimized path for when the input is all single-character runes.

Generally to accomplish that, you end up with an API that looks like FindFirstMultiByteChar([]byte) int, for example: https://github.com/ianlopshire/go-fixedwidth/blob/master/decode.go#L166-L175

Having such an API in the Go standard library would be helpful on the basis of utility alone. However, this function also lends itself to a high performance vectorized implementation -- in local tests simply unsafely casting the input to a []uintptr produces roughly linear speedups (presumably even greater speedups are available to brave souls willing to write AVX2 instructions).

As a result of the relative commonality, and the possibility for the stdlib to offer a more optimized implementation than users are likely to write on their own, I think it'd be beneficial to include this in the standard library. If there's interest, I'm happy to provide a patch.

@titanous titanous added the Proposal label Sep 18, 2019
@titanous titanous changed the title bytes: Introduce a FindFirstMultiByteChar API proposal: bytes: Introduce a FindFirstMultiByteChar API Sep 18, 2019
@gopherbot gopherbot added this to the Proposal milestone Sep 18, 2019
@rsc rsc added this to Incoming in Proposals Dec 4, 2019
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 11, 2019

I could see this being in utf8.LeadingASCIICount or something like that, maybe under a better name, but only if it were commonly needed and straightforward to use correctly. I am not sure whether either of those is true. Do you have data about either of those, or even anecdotes about when it would be used?

In general "we know how to implement this function very quickly" is not enough for inclusion in the standard library.

@rsc rsc moved this from Incoming to Active in Proposals Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
4 participants
You can’t perform that action at this time.