Skip to content

strconv: optimize Parse for []byte arguments #42429

@dsnet

Description

@dsnet

In 2016, #2632 was closed with the explanation that:

I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs.

In summary, it's argued that 1) the compiler should address this, and 2) people can vendor and modify the strconv package.

It's been 4 years since the issue was closed (and 9 years since the issue was first reported), and there there hasn't been a compiler optimization that permits calling the Parse functions with a []byte without it creating garbage. That is:

var b []byte = ...
v, err := strconv.ParseXXX(string(b), ...)

always (as of Go1.15) allocates and copies the input []byte.

Forking strconv is an unfavorable workaround because it means that the vendor fails to benefit from future optimizations to strconv (e.g., cl/187957 or cl/260858) and fixes to the implementation (e.g., #29491) and may go against company guidelines that forbid forks. Personally, I haven't seen anyone fork strconv. Rather, in most cases I see users cast a []byte to a string using unsafe. This is what we do in the protobuf module. However, some use-cases are constrained to avoid using unsafe (e.g., encoding/json) and so continue to suffer from performance losses.

Given the passage of time and the lack of a natural solution to this, I think we should revisit this issue. It seems that we should either:

  1. Modify the compiler to detect cases where a string variable does not escape and that it's use involves no synchronization points, so that it functionally performs an allocation-free cast of []byte to string in the example snippet above. If such a compiler optimization existed, we would probably also need to modify strconv to avoid escaping the string input through the error value by copying the input string. Note that this will slow down cases where parsing fails.
  2. Accept that a compiler optimization is not happening, in which case we add ParseBoolBytes, ParseComplexBytes, ParseFloatBytes, ParseIntBytes, and ParseUintBytes, which are identical to their counterparts without the Bytes suffix, but operate on a []byte as the input instead of a string. (We can add Bytes equivalents for the QuoteXXX functions as well, but those seem to be rarely with a []byte.)

Obviously option 1 is preferred, but if its not going to happen, but perhaps it's time to do option 2. It's been almost a decade.

\cc @mvdan @rogpeppe @mdlayher

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions