Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
proposal: bytes: Introduce a FindFirstMultiByteChar API #34375
A relatively common operation in code that's trying to be high-performance when dealing with utf8 strings is to contain an optimized path for when the input is all single-character runes.
Generally to accomplish that, you end up with an API that looks like
Having such an API in the Go standard library would be helpful on the basis of utility alone. However, this function also lends itself to a high performance vectorized implementation -- in local tests simply unsafely casting the input to a
As a result of the relative commonality, and the possibility for the stdlib to offer a more optimized implementation than users are likely to write on their own, I think it'd be beneficial to include this in the standard library. If there's interest, I'm happy to provide a patch.
I could see this being in utf8.LeadingASCIICount or something like that, maybe under a better name, but only if it were commonly needed and straightforward to use correctly. I am not sure whether either of those is true. Do you have data about either of those, or even anecdotes about when it would be used?
In general "we know how to implement this function very quickly" is not enough for inclusion in the standard library.