/ go Public
cmd/compile: consider to add move-like optimization for some string<->byte operations #50846
Issues related to the Go compiler and/or runtime.
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
1. Building the string
In Go, it's hard to efficiently build a string. Contrary to what
strings.Builderdocumentation says, it's not always more efficient than a direct
byteusage with extra allocation and copying when doing
string(byte)conversion in the end. We can't blindly replace all places with
strings.Builderdue to its overhead for some use cases (need to write many small chunks while building the string).
A quick glance tells that we can't really make
strings.Builderas fast as we might want it to be: even if it doesn't do a copycheck and we call append directly, some algorithms use copy + write by index to be really fast. For example, some code in
strings.Replacerspecialization uses this instead of
strings.Builder. It's 1 extra allocation, but CPU-wise it's ~20% faster on average.
2. Reusing byte algorithms
When you write a code that returns
byte, you may want to provide a second function that returns a string. For most cases, you'll probably start with something like this:
This will add extra data copying + allocation. Not to mention that if Encode returns a fresh allocation (and it only "escapes" in the return), there is no need to copy it. We can "move" it (transfer the ownership) to the return value. In other words, we could use
We can do this since
Encode()return value doesn't have any other references to it, it's a temporary that we can move easily.
This will make the code above very efficient. Not only it would avoid the extra allocation, it'll also benefit from the fact that
Encode()is implemented over
byteinstead of some cross-type manner that will make the code 20-30% slower for both cases.
This cross-type manner usually involves passing both byte to append (for byte case) and a bool flag to return string instead. Another approach is passing both
bytes.Buffer, so the function writes the data to the appropriate storage provided by the caller. As noted above, it's not as efficient as we could be with another approach, given that compiler would add some optimizations for that case.
3. Append-like API are bad for strings
Right now one of the best tricks to avoid redundant allocations is to use append-like APIs, where the caller passes a slice that can be reused later. This works well for bytes, but not so well for strings.
With the optimization requested in this issue, we could do this:
When converting a
string, if it's proven that after this conversion no one case reference the original slice, initialize the string using the
slicebytetostringtmp-like operation (we could add a wrapper called
slicebytetostringmoveto make it's easier to separate the semantics).
Some examples of when this could be done:
b. That slice is then returned as
string(f()). We need to know that
f()returns a newly allocated memory though, because we shouldn't move reused or global memory. This would require an extra func bit.
(1) Helps in the first problem. We'll be able to build a string inside
byteand return it as a string without overhead.
(2) Helps in the second problem. We'll be able to do
string(f())to reuse the byte-oriented routines.
These 2 cases solve most issues outlined above, but it may also require some extra care to distinguish how function param "leaks": leaking to a return value is OK for append API, but leaking to global memory is not. For simplicity, I would avoid handling the (3) problem, but maybe someone has a good idea of how to handle it without extra too much effort.
I believe that the opposite is not as useful, but it could be possible to remove some string->byte copying with this idea as well. Although I would ask for some use cases of when it could be useful.
Since it doesn't look like a trivial change, it seems fair to discuss it before going beyond the minimal prototype.
While it might look like a minor issue, it helps to fix several places in the stdlib (make them either ~15% faster or involve 1 less data copying; sometimes it just makes the code more readable because you don't need to pass 2 destinations into a common function).
It also should help to fix some issues in the application I'm trying to optimize.
stringconversion is quite high in the profile there, but
strings.Builderor other approaches will not solve the issue completely -- it can't compete with a direct
An alternative solutions
Faster (or new) strings.Builder
If there is a way to make
strings.Builderfaster or add a new type that can operate like a slice of bytes with direct access (not sure it's possible to make it "safe"), this compiler optimization will not be needed as much.
I don't like this approach, but figured I could leave it here as a thought.
If we had an intrinsic function like
unsafe.BytesToString(b)(don't mind the naming) that compiler would check (whether it's valid to do this move). If it can prove it, it inserts an allocation-free conversion. Otherwise, it gives a compilation error.
The only benefit is probably the fact that it could be easier for the compiler to do this analysis only for the places it's asked to do it.
While it's possible to do something like this:
It makes the code quite fragile and it demands the unsafe usage. It can also break if
bstarts to escape in some unpredictable way or we start using a
sync.Poolfor bytes, so
bcould be shared.
Extra: strings.Replacer rewrite with strings.Builder
Results (tl;dr - less allocations, slower execution time):
The string -> byte cases
I don't have enough info and stats for this case, but I can provide a few ones nonetheless.
First, we would need a
stringtoslicebytetmpfunction to make it possible.
Some obvious cases where it can be applied (this list is incomplete):
byte(x + y)
If we use this knowledge about the freshly allocated data, some expressions like this can avoid excessive string->bytes allocation:
The text was updated successfully, but these errors were encountered: