strconv: add equivalents of Parsexxx() with []byte arguments #2632

remyoudompheng · 2011-12-29T07:11:59Z

Hello,

Just like we have FormatFloat(...) string and AppendFloat(...) []byte, it would be nice
to avoid a costly string conversion when parsing numbers from a byteslice.

Simple code like:

func main() {
  s := "123.45"
  r := bytes.NewBufferString(s)
  runtime.UpdateMemStats()
  println(runtime.MemStats.Mallocs)
  var v float64
  fmt.Fscanf(r, "%f", &v)
  runtime.UpdateMemStats()
  println(runtime.MemStats.Mallocs)
}

says it's doing 7 allocations in the Scanf.

adg · 2012-01-04T05:41:03Z

Comment 1:

Sounds like a nice idea.
Naming suggestions? "ParseFloatBytes"?

Labels changed: added priority-later, removed priority-triage.

Status changed to HelpWanted.

rsc · 2012-01-09T16:39:38Z

Comment 2:

I would like to think more about this.
It doubles everything, and we may be able to avoid
it with careful compilation.

bradfitz · 2012-03-05T19:10:49Z

Comment 3:

Regarding "careful compilation", see my comment in the duplicate issue #3197.

bradfitz · 2012-03-05T19:11:27Z

Comment 4:

Issue #3197 has been merged into this issue.

bradfitz · 2012-03-05T19:11:50Z

Comment 5:

From issue #3197:
I attempted to convert strconv's Parse internals to use []byte instead, and then make
the existing string versions make an unsafe []byte of the string's memory using
reflect.SliceHeader / reflect.StringHeader, but the reflect package already depends on
strconv, so there's an import loop.

remyoudompheng · 2012-03-05T19:13:53Z

Comment 6:

In the meanwhile, when I had to parse many numbers and save some allocations, I would
convert a big []byte to a big string and slice from it, instead of slicing from the
[]byte and string-ifying each chunk.

bradfitz · 2012-03-05T19:18:47Z

Comment 7:

I wonder if reflect.SliceHeader / StringHeader should be mirrored in package runtime.

bradfitz · 2012-03-06T19:15:43Z

Comment 8:

Implemented here, for discussion at some point:
http://golang.org/cl/5759057/
The byte view approach may be considered either gross or lovely. Not sure. The grossness
is isolated, well tested, and can't escape to callers at least.

rsc · 2012-03-06T20:23:43Z

Comment 9:

I am much less worried about the implementation than the expansion of the API.

bradfitz · 2012-03-06T22:10:05Z

Comment 10:

True but this API was already extended with Append* for []byte efficiency, and this just
mirrors that.
And we already have nearly-duplicate strings and bytes packages.  This []byte/string
duplicity is unfortunate, but this is far from the only place.
Still, Later.

rsc · 2012-03-06T22:18:22Z

Comment 11:

Like I said in #2, we may be able to do better with Parse
than with Format/Append.  The distinction is that Parse
accepts strings as input, while Format returns strings as
output.  It is easier to analyze the former than the latter.

bradfitz · 2012-03-06T22:32:58Z

Comment 12:

Yeah, there's a whole bunch of string<->[]byte optimizations that would be nice.
issue #2204 comes to mind.  And in another project, I want to be able to do a lookup in a
map[string]T with a []byte key. It'd be nice if:
      var m map[string]T = ...
      var key []byte = ...
      v, ok := m[string(key)]
...  didn't generate garabage too.

remyoudompheng · 2012-03-06T22:42:32Z

Comment 13:

I think we have an issue about "escape analysis of strings" already. If a string([]byte)
does not escape, it won't be stored so further modifications to the []byte are safe even
if the string was created zero-copishly. Ah it's issue #2205, but it's about
[]byte(string).
The string([]byte) case does not require read-only flagging, I think, only ordinary
escape analysis.

bradfitz · 2013-04-25T18:33:49Z

Comment 14:

Labels changed: added performance.

bradfitz · 2013-04-26T03:42:00Z

Comment 15:

Labels changed: added garbage.

bradfitz · 2013-04-26T16:59:25Z

Comment 16:

This issue was just causing performance problems for a user inside Google. They had a
large sstable with float64 []byte values. Each strconv.ParseFloat(string(value), 64) was
generating garbage (which ended up being a lot of garbage)
As a medium-term hack, I showed they could do:
    func unsafeString(b []byte) string {
       return *(*string)(unsafe.Pointer(&b))
    }
    freq, err := strconv.ParseFloat(unsafeString(val), 64)
But I felt bad even pointing that out.
I'd really like issue #2205 to be the answer to this issue, so we don't have to add new
API.
That is, I'd like this to be garbage-free:
     var b []byte = ....
     freq, err := strconv.ParseFloat(string(b), 64)
The compiler should do the equivalent unsafeString thing, if it can statically determine
that the string never escapes strconv.ParseFloat.
Yes, the strconv.ParseFloat could then observe concurrent mutations of b via its string,
like this:
     var b []byte = ....
     go mutateBytes(b)
     freq, err := strconv.ParseFloat(string(b), 64)
But that was already a data race.

DanielMorsing · 2013-04-27T10:54:26Z

Comment 17:

If we implement no-op byte to string conversions, there are a couple of library changes
that would have to be made to use it effectively. Right now, ParseFloat leaks its string
parameter because it's captured in the error handling. With the change, it would be
advantageous to copy the string instead.
Coincidentally, I don't think we have a good way to copy strings. We've never had the
need to before.

minux · 2013-04-27T13:30:12Z

Comment 18:

doesn't append([]byte{}, str...) do the trick for copying the string?
(into a []byte, of course, we can't really copy a string.)
if the optimization is implemented, just convert the copied []byte
to string will achieve the real copy-string effect (well, maybe
it copies the string multiple times, but it doesn't matter much IMO).

davecheney · 2013-04-27T13:39:52Z

Comment 19:

http://play.golang.org/p/ghP85_uea9 appears to copy the backing contents of a string.

bradfitz · 2013-04-28T02:44:45Z

Comment 20:

Dave, you're looking at the address of the StringHeader, not its backing contents.  For
instance: http://play.golang.org/p/-AI1pPNIwW shows the addresses are different, even
though their backing contents are the same.
You really need unsafe to determine whether two strings alias the same memory.
But </tangent>, yes: the strconv package would have to take care to not retain its
input strings, even in error messages.  And we'd need to write tests to guarantee that
strconv never regresses and starts allocating. But that's easy and we do that in a
number of other packages.

rsc · 2013-07-30T16:58:06Z

Comment 21:

Status changed to Thinking.

rsc · 2013-11-27T18:50:30Z

Comment 22:

Labels changed: added go1.3maybe.

rsc · 2013-12-04T01:31:04Z

Comment 23:

Labels changed: added release-none, removed go1.3maybe.

rsc · 2013-12-04T01:51:54Z

Comment 24:

Labels changed: added repo-main.

griesemer · 2015-10-01T21:00:48Z

@bradfitz The point I am making is that the API would have to be []byte oriented in the first place for the (relatively simple) compiler optimization to work. That is, maybe we need to bite the bullet and add these functions. The string versions would be trivial wrappers (which in turn could benefit from the compiler optimization once present).

rogpeppe · 2015-10-01T22:24:23Z

@griesemer re: the other way around is more tricky:
ISTM that for the string case you should be able to discount other goroutines as long as there are no synchronisation points, as then there would be a race condition (the read to convert string to []byte vs the write to the slice), and with a race condition we don't guarantee correctness so it may still be a viable technique.

You'd also have to make sure that the function in question didn't make any external calls to code that might modify the slice.

griesemer · 2015-10-01T22:33:41Z

@rogpeppe It's much harder to know statically that there's (globally) no goroutines with synchronization than to know whether a single function doesn't modify an incoming bytes. External calls by that single function could tell (via export data) whether they modify or store incoming byte arguments elsewhere.

ianlancetaylor · 2015-10-01T23:39:57Z

As I was arguing somewhere else, we don't have to know globally that there are no goroutines with synchronization. We only have to know locally that the function we are calling has no synchronization points. We can't know that if there are any calls through an interface method, but otherwise we can.

griesemer · 2015-10-01T23:47:47Z

@ianlancetaylor good point!

rogpeppe · 2015-10-02T13:07:55Z

@ianlancetaylor exactly.

gopherbot · 2016-06-05T22:00:22Z

CL https://golang.org/cl/23787 mentions this issue.

adg · 2016-06-06T00:53:54Z

Can't happen until Go 1.8.

odeke-em · 2016-08-21T08:34:41Z

@dsnet raised a great point in https://go-review.googlesource.com/#/c/23787

Unless the compiler guys say this ain't happening, I'm personally still willing to wait for the compiler to address this.

I kindly wanted to ask @josharian @randall77 @griesemer and other compiler folks what y'all think about this; do y'all think we should wait for the compiler optimizations or should we be biting the bullet and implement these functions as suggested by @griesemer here #2632 (comment), then later the string implementation could use these optimizations?

@bradfitz great point in #2632 (comment), I was checking with sample usage for the number of allocations when writing the CL, we should in deed include tests to make sure the strconv package never regresses.

josharian · 2016-08-21T16:46:40Z

Also cc @dr2chase @dvyukov

randall77 · 2016-08-21T18:55:18Z

I don't think anyone is working on this at the moment. It's doable though. We'd have to add some data to the export info about whether any synchronization is done inside functions.

I'd recommend we wait and do this in the compiler. But that's not a promise on timeline...

bradfitz · 2016-08-21T19:02:24Z

/cc @griesemer for export data.

griesemer · 2016-08-22T05:45:38Z

@randall77 It's trivial to add additional information about functions in the export data, even in a backward-compatible way. We just need to have that information.

odeke-em · 2016-08-28T21:04:12Z

From the current direction, it seems to me that this issue is a subset or perhaps even duplicate of #2205.

rsc · 2016-10-10T19:37:17Z

I would like to avoid doing this. I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs.

bradfitz · 2016-10-12T05:20:04Z

e.g. https://godoc.org/go4.org/strutil#ParseUintBytes

hirochachacha · 2016-11-01T07:32:35Z

FWIW, here is working (but tricky) patches to make ParseXXX no escaping.
string([]byte(s)) is used just for cloning strings. I don't know better way to do this though.
(I realized that extending escape analysis isn't an easy path for this.)

diff --git a/src/strconv/atoi.go b/src/strconv/atoi.go
index 66df149..d6f645a 100644
--- a/src/strconv/atoi.go
+++ b/src/strconv/atoi.go
@@ -24,11 +24,11 @@ func (e *NumError) Error() string {
 }

 func syntaxError(fn, str string) *NumError {
-   return &NumError{fn, str, ErrSyntax}
+   return &NumError{fn, string([]byte(str)), ErrSyntax}
 }

 func rangeError(fn, str string) *NumError {
-   return &NumError{fn, str, ErrRange}
+   return &NumError{fn, string([]byte(str)), ErrRange}
 }

 const intSize = 32 << (^uint(0) >> 63)
@@ -134,7 +134,7 @@ func ParseUint(s string, base int, bitSize int) (uint64, error) {
    return n, nil

 Error:
-   return n, &NumError{"ParseUint", s, err}
+   return n, &NumError{"ParseUint", string([]byte(s)), err}
 }

 // ParseInt interprets a string s in the given base (2 to 36) and
@@ -180,7 +180,7 @@ func ParseInt(s string, base int, bitSize int) (i int64, err error) {
    un, err = ParseUint(s, base, bitSize)
    if err != nil && err.(*NumError).Err != ErrRange {
        err.(*NumError).Func = fnParseInt
-       err.(*NumError).Num = s0
+       err.(*NumError).Num = string([]byte(s0))
        return 0, err
    }
    cutoff := uint64(1 << uint(bitSize-1))

bradfitz · 2016-11-01T12:55:36Z

@hirochachacha, what are you trying to accomplish there?

hirochachacha · 2016-11-01T13:08:48Z

@bradfitz I'm not sure this is deserved to send a CL.

given:

package main

import "strconv"

func main() {
    s := []byte("1234")
    strconv.ParseFloat(string(s[:2]), 64)
    strconv.Atoi(string(s[2:4]))
}

$ go build -gcflags -m a.go

before:

./a.go:7: string(s[:2]) escapes to heap
./a.go:8: string(s[2:4]) escapes to heap
./a.go:6: main ([]byte)("1234") does not escape

after:

./a.go:6: main ([]byte)("1234") does not escape
./a.go:7: main string(s[:2]) does not escape
./a.go:8: main string(s[2:4]) does not escape

dsnet · 2016-11-01T18:21:35Z

I believe @hirochachacha is making the same point I made in issuecomment-135883394. Suppose there was some compiler optimization that could prove that a function never mutates an input []byte and that there are no synchronization events, then it could use a string in its place. This optimization still wouldn't be helpful to the strconv package since the input escapes through the error value (and thus making it really difficult for the compiler to prove anything about the input).

Since this compiler optimization doesn't exist yet, I vote that we don't address this right now. Always copying the input in the error case may lead to unexpectedly bad performance in situations where a string is parsed in a tight loop and expected to fail.

hirochachacha · 2016-11-02T00:30:56Z

@dsnet I agree with you at all points. Sorry for the duplication, I didn't notice that.

remyoudompheng added Thinking priority-later Performance GarbageCollector labels Dec 4, 2013

bradfitz mentioned this issue Dec 4, 2015

RFE: stream-oriented versions of strings.Unquote, strconv.ParseInt, etc #13482

Closed

buger mentioned this issue Mar 25, 2016

Add GetInt helper (and make GetNumber faster)? buger/jsonparser#16

Closed

adg modified the milestones: Go1.8, Unplanned Jun 6, 2016

akhiltak mentioned this issue Sep 5, 2016

Feature/supporting types graphql dgraph-io/dgraph#185

Closed

4 tasks

quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 10, 2016

rsc closed this as completed Oct 10, 2016

golang locked and limited conversation to collaborators Nov 2, 2017

gopherbot added the FrozenDueToAge label Nov 2, 2017

strconv: add equivalents of Parsexxx() with []byte arguments #2632

strconv: add equivalents of Parsexxx() with []byte arguments #2632

Comments

remyoudompheng commented Dec 29, 2011

adg commented Jan 4, 2012

rsc commented Jan 9, 2012

bradfitz commented Mar 5, 2012

bradfitz commented Mar 5, 2012

bradfitz commented Mar 5, 2012

remyoudompheng commented Mar 5, 2012

bradfitz commented Mar 5, 2012

bradfitz commented Mar 6, 2012

rsc commented Mar 6, 2012

bradfitz commented Mar 6, 2012

rsc commented Mar 6, 2012

bradfitz commented Mar 6, 2012

remyoudompheng commented Mar 6, 2012

bradfitz commented Apr 25, 2013

bradfitz commented Apr 26, 2013

bradfitz commented Apr 26, 2013

DanielMorsing commented Apr 27, 2013

minux commented Apr 27, 2013

davecheney commented Apr 27, 2013

bradfitz commented Apr 28, 2013

rsc commented Jul 30, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

griesemer commented Oct 1, 2015

rogpeppe commented Oct 1, 2015

griesemer commented Oct 1, 2015

ianlancetaylor commented Oct 1, 2015

griesemer commented Oct 1, 2015

rogpeppe commented Oct 2, 2015

gopherbot commented Jun 5, 2016

adg commented Jun 6, 2016

odeke-em commented Aug 21, 2016

josharian commented Aug 21, 2016

randall77 commented Aug 21, 2016

bradfitz commented Aug 21, 2016

griesemer commented Aug 22, 2016

odeke-em commented Aug 28, 2016

rsc commented Oct 10, 2016

bradfitz commented Oct 12, 2016

hirochachacha commented Nov 1, 2016

bradfitz commented Nov 1, 2016

hirochachacha commented Nov 1, 2016

dsnet commented Nov 1, 2016 • edited Loading

hirochachacha commented Nov 2, 2016

dsnet commented Nov 1, 2016 •

edited

Loading