-
Notifications
You must be signed in to change notification settings - Fork 17.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strconv: add equivalents of Parsexxx() with []byte arguments #2632
Comments
Regarding "careful compilation", see my comment in the duplicate issue #3197. |
Issue #3197 has been merged into this issue. |
From issue #3197: I attempted to convert strconv's Parse internals to use []byte instead, and then make the existing string versions make an unsafe []byte of the string's memory using reflect.SliceHeader / reflect.StringHeader, but the reflect package already depends on strconv, so there's an import loop. |
Implemented here, for discussion at some point: http://golang.org/cl/5759057/ The byte view approach may be considered either gross or lovely. Not sure. The grossness is isolated, well tested, and can't escape to callers at least. |
Yeah, there's a whole bunch of string<->[]byte optimizations that would be nice. issue #2204 comes to mind. And in another project, I want to be able to do a lookup in a map[string]T with a []byte key. It'd be nice if: var m map[string]T = ... var key []byte = ... v, ok := m[string(key)] ... didn't generate garabage too. |
I think we have an issue about "escape analysis of strings" already. If a string([]byte) does not escape, it won't be stored so further modifications to the []byte are safe even if the string was created zero-copishly. Ah it's issue #2205, but it's about []byte(string). The string([]byte) case does not require read-only flagging, I think, only ordinary escape analysis. |
This issue was just causing performance problems for a user inside Google. They had a large sstable with float64 []byte values. Each strconv.ParseFloat(string(value), 64) was generating garbage (which ended up being a lot of garbage) As a medium-term hack, I showed they could do: func unsafeString(b []byte) string { return *(*string)(unsafe.Pointer(&b)) } freq, err := strconv.ParseFloat(unsafeString(val), 64) But I felt bad even pointing that out. I'd really like issue #2205 to be the answer to this issue, so we don't have to add new API. That is, I'd like this to be garbage-free: var b []byte = .... freq, err := strconv.ParseFloat(string(b), 64) The compiler should do the equivalent unsafeString thing, if it can statically determine that the string never escapes strconv.ParseFloat. Yes, the strconv.ParseFloat could then observe concurrent mutations of b via its string, like this: var b []byte = .... go mutateBytes(b) freq, err := strconv.ParseFloat(string(b), 64) But that was already a data race. |
If we implement no-op byte to string conversions, there are a couple of library changes that would have to be made to use it effectively. Right now, ParseFloat leaks its string parameter because it's captured in the error handling. With the change, it would be advantageous to copy the string instead. Coincidentally, I don't think we have a good way to copy strings. We've never had the need to before. |
doesn't append([]byte{}, str...) do the trick for copying the string? (into a []byte, of course, we can't really copy a string.) if the optimization is implemented, just convert the copied []byte to string will achieve the real copy-string effect (well, maybe it copies the string multiple times, but it doesn't matter much IMO). |
http://play.golang.org/p/ghP85_uea9 appears to copy the backing contents of a string. |
Dave, you're looking at the address of the StringHeader, not its backing contents. For instance: http://play.golang.org/p/-AI1pPNIwW shows the addresses are different, even though their backing contents are the same. You really need unsafe to determine whether two strings alias the same memory. But </tangent>, yes: the strconv package would have to take care to not retain its input strings, even in error messages. And we'd need to write tests to guarantee that strconv never regresses and starts allocating. But that's easy and we do that in a number of other packages. |
@bradfitz The point I am making is that the API would have to be []byte oriented in the first place for the (relatively simple) compiler optimization to work. That is, maybe we need to bite the bullet and add these functions. The string versions would be trivial wrappers (which in turn could benefit from the compiler optimization once present). |
@griesemer re: the other way around is more tricky: You'd also have to make sure that the function in question didn't make any external calls to code that might modify the slice. |
@rogpeppe It's much harder to know statically that there's (globally) no goroutines with synchronization than to know whether a single function doesn't modify an incoming bytes. External calls by that single function could tell (via export data) whether they modify or store incoming byte arguments elsewhere. |
As I was arguing somewhere else, we don't have to know globally that there are no goroutines with synchronization. We only have to know locally that the function we are calling has no synchronization points. We can't know that if there are any calls through an interface method, but otherwise we can. |
@ianlancetaylor good point! |
@ianlancetaylor exactly. |
CL https://golang.org/cl/23787 mentions this issue. |
Can't happen until Go 1.8. |
@dsnet raised a great point in https://go-review.googlesource.com/#/c/23787
I kindly wanted to ask @josharian @randall77 @griesemer and other compiler folks what y'all think about this; do y'all think we should wait for the compiler optimizations or should we be biting the bullet and implement these functions as suggested by @griesemer here #2632 (comment), then later the string implementation could use these optimizations? @bradfitz great point in #2632 (comment), I was checking with sample usage for the number of allocations when writing the CL, we should in deed include tests to make sure the strconv package never regresses. |
I don't think anyone is working on this at the moment. It's doable though. We'd have to add some data to the export info about whether any synchronization is done inside functions. I'd recommend we wait and do this in the compiler. But that's not a promise on timeline... |
/cc @griesemer for export data. |
@randall77 It's trivial to add additional information about functions in the export data, even in a backward-compatible way. We just need to have that information. |
From the current direction, it seems to me that this issue is a subset or perhaps even duplicate of #2205. |
I would like to avoid doing this. I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs. |
FWIW, here is working (but tricky) patches to make ParseXXX no escaping.
|
@hirochachacha, what are you trying to accomplish there? |
@bradfitz I'm not sure this is deserved to send a CL. given: package main
import "strconv"
func main() {
s := []byte("1234")
strconv.ParseFloat(string(s[:2]), 64)
strconv.Atoi(string(s[2:4]))
} $ go build -gcflags -m a.go before:
after:
|
I believe @hirochachacha is making the same point I made in issuecomment-135883394. Suppose there was some compiler optimization that could prove that a function never mutates an input Since this compiler optimization doesn't exist yet, I vote that we don't address this right now. Always copying the input in the error case may lead to unexpectedly bad performance in situations where a string is parsed in a tight loop and expected to fail. |
@dsnet I agree with you at all points. Sorry for the duplication, I didn't notice that. |
The text was updated successfully, but these errors were encountered: