Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: optimize Parse for []byte arguments #42429

Open
dsnet opened this issue Nov 6, 2020 · 9 comments
Open

strconv: optimize Parse for []byte arguments #42429

dsnet opened this issue Nov 6, 2020 · 9 comments

Comments

@dsnet
Copy link
Member

@dsnet dsnet commented Nov 6, 2020

In 2016, #2632 was closed with the explanation that:

I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs.

In summary, it's argued that 1) the compiler should address this, and 2) people can vendor and modify the strconv package.

It's been 4 years since the issue was closed (and 9 years since the issue was first reported), and there there hasn't been a compiler optimization that permits calling the Parse functions with a []byte without it creating garbage. That is:

var b []byte = ...
v, err := strconv.ParseXXX(string(b), ...)

always (as of Go1.15) allocates and copies the input []byte.

Forking strconv is an unfavorable workaround because it means that the vendor fails to benefit from future optimizations to strconv (e.g., cl/187957 or cl/260858) and fixes to the implementation (e.g., #29491) and may go against company guidelines that forbid forks. Personally, I haven't seen anyone fork strconv. Rather, in most cases I see users cast a []byte to a string using unsafe. This is what we do in the protobuf module. However, some use-cases are constrained to avoid using unsafe (e.g., encoding/json) and so continue to suffer from performance losses.

Given the passage of time and the lack of a natural solution to this, I think we should revisit this issue. It seems that we should either:

  1. Modify the compiler to detect cases where a string variable does not escape and that it's use involves no synchronization points, so that it functionally performs an allocation-free cast of []byte to string in the example snippet above. If such a compiler optimization existed, we would probably also need to modify strconv to avoid escaping the string input through the error value by copying the input string. Note that this will slow down cases where parsing fails.
  2. Accept that a compiler optimization is not happening, in which case we add ParseBoolBytes, ParseComplexBytes, ParseFloatBytes, ParseIntBytes, and ParseUintBytes, which are identical to their counterparts without the Bytes suffix, but operate on a []byte as the input instead of a string. (We can add Bytes equivalents for the QuoteXXX functions as well, but those seem to be rarely with a []byte.)

Obviously option 1 is preferred, but if its not going to happen, but perhaps it's time to do option 2. It's been almost a decade.

\cc @mvdan @rogpeppe @mdlayher

@cespare
Copy link
Contributor

@cespare cespare commented Nov 6, 2020

Rather, in most cases I see users cast a []byte to a string using unsafe. This is what we do in the protobuf module.

At my company we have shared functions that do this too. They see a reasonable amount of use in high-performance code.

It's not hard to write the unsafe code, but it's not completely trivial either -- it's easy to overlook the problem of the string being retained unsafely inside the NumError.

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 6, 2020

Don't forget option (3): wait for a decision on generics, and then provide an updated strconv library with functions that are polymorphic over indexable byte-containers.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Nov 6, 2020

If there's a new strconv library that's generic over []byte and string then any new call we add to strconv now can presumably be trivially implemented in terms of that

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 13, 2020

If something like #2205 (comment) was implemented, perhaps we could get the compiler to do the work for us after all. That said, I'm not in favor of simply waiting another few years for that to happen.

@cristaloleg
Copy link

@cristaloleg cristaloleg commented Nov 13, 2020

Similar to option (3): byte view that can represent []byte or string #5376 (however it it's covered by generics, probably)

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Nov 20, 2020

If there was a compiler optimisation for this, it would be good if it could be generalised to support string to []byte conversions too. For example, at the moment, there's no way to calculate the hash of a string without allocations, even with devirtualisation, because the call to Write([]byte(s)) will always incur an allocation.

@go101
Copy link

@go101 go101 commented Nov 23, 2020

Maybe, it is not a too bad idea to let slices comparable. Comparing slices is just like comparing strings.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 23, 2020

@go101 That is #23725 or #38377. This is not the right place to discuss those ideas.

@go101
Copy link

@go101 go101 commented Nov 23, 2020

Sorry, I really misunderstood @rogpeppe's last comment. In my case, I often want structs with byte slice fields are hashable and can be used as map keys but I have to convert the slices to strings in other structs to get hashable values.

[update]: I have ever made a proposal to solve the problem mentioned in rogpeppe's last comment: https://github.com/go101/go101/wiki/A-proposal-to-avoid-duplicating-underlying-bytes-when-using-strings-as-read-only-%5B%5Dbyte-arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants