cmd/gc: optimized map[string] lookup from []byte key #3512

Closed
bradfitz opened this Issue Apr 11, 2012 · 22 comments

Comments

Projects
None yet
6 participants
@bradfitz
Member

bradfitz commented Apr 11, 2012

key := []byte("some key")  // from network, etc

var m map[string]T
....
v, ok := m[string(key)]  // It would be nice if this were optimized away and didn't
actually make a copy & garbage
....
@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Apr 11, 2012

Contributor

Comment 1:

In general this might introduce races.  If you pass in a []byte to the key lookup,
there is no guarantee that something is not editing the bytes during the lookup.
But it could probably be done so that it would just fail the lookup instead of 
causing corruption.

Labels changed: added priority-later, removed priority-triage.

Owner changed to @rsc.

Status changed to Thinking.

Contributor

rsc commented Apr 11, 2012

Comment 1:

In general this might introduce races.  If you pass in a []byte to the key lookup,
there is no guarantee that something is not editing the bytes during the lookup.
But it could probably be done so that it would just fail the lookup instead of 
causing corruption.

Labels changed: added priority-later, removed priority-triage.

Owner changed to @rsc.

Status changed to Thinking.

@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Apr 11, 2012

Member

Comment 2:

Failing the lookup in that case would be fine & expected.
In fact, I think the naive implementation would get that for free: if the memory is
scanned once for the hashcode and once for the equality, it seems like it just might not
match if the []byte changed.  I don't see how it could do something worse.  The []byte
couldn't be freed by GC and reclaimed to the OS in the meantime because the faked
non-escaping stringheader would still pin it.
Member

bradfitz commented Apr 11, 2012

Comment 2:

Failing the lookup in that case would be fine & expected.
In fact, I think the naive implementation would get that for free: if the memory is
scanned once for the hashcode and once for the equality, it seems like it just might not
match if the []byte changed.  I don't see how it could do something worse.  The []byte
couldn't be freed by GC and reclaimed to the OS in the meantime because the faked
non-escaping stringheader would still pin it.
@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Sep 12, 2012

Contributor

Comment 3:

Labels changed: added go1.1maybe.

Contributor

rsc commented Sep 12, 2012

Comment 3:

Labels changed: added go1.1maybe.

@robpike

This comment has been minimized.

Show comment Hide comment
@robpike

robpike Mar 7, 2013

Contributor

Comment 4:

Labels changed: removed go1.1maybe.

Contributor

robpike commented Mar 7, 2013

Comment 4:

Labels changed: removed go1.1maybe.

@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Mar 31, 2013

Member

Comment 5:

Answering Dmitry's question from https://golang.org/issue/5160?c=4
here:
Consider:
m := make(map[string]T)
var b []byte = slice of some buffer
go someFunc(b)
t, ok := m[string(b)] 
I would like:
1) string(b) to not make a copy. It should instead make a non-escaping fake String
header to pass to the map lookup function, using the same byte data & length as the
[]byte b.
2) the map lookup to be guaranteed to not crash, even if b is concurrently mutated.
3) t to be valid, iff ok == true.
I don't think that would violate any  of the current Go semantics, since any ordering
between mutations from "go someFunc" and string(b) are already not specified by the
memory model.
Correct?
Member

bradfitz commented Mar 31, 2013

Comment 5:

Answering Dmitry's question from https://golang.org/issue/5160?c=4
here:
Consider:
m := make(map[string]T)
var b []byte = slice of some buffer
go someFunc(b)
t, ok := m[string(b)] 
I would like:
1) string(b) to not make a copy. It should instead make a non-escaping fake String
header to pass to the map lookup function, using the same byte data & length as the
[]byte b.
2) the map lookup to be guaranteed to not crash, even if b is concurrently mutated.
3) t to be valid, iff ok == true.
I don't think that would violate any  of the current Go semantics, since any ordering
between mutations from "go someFunc" and string(b) are already not specified by the
memory model.
Correct?
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov Mar 31, 2013

Member

Comment 6:

How is it related to interning? There is no sense to intern temps.
Are talking only about map indexing m[string(b)], or string(b) temps in general?
Member

dvyukov commented Mar 31, 2013

Comment 6:

How is it related to interning? There is no sense to intern temps.
Are talking only about map indexing m[string(b)], or string(b) temps in general?
@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Mar 31, 2013

Member

Comment 7:

This bug has nothing to do with interning except that if I were to implement interning
myself in pure Go, without the help of the runtime (which is issue #5160), then I'd want
to lookup interned strings from []byte from the network, without making copies.
That is, I'd have:
  var internTable map[string]string  // identity map. from key to same value.
But I'd want to access it like:
   var b []byte = slice of bytes from the network in a bytes.Buffer
   _ = internTable[string(b)]
... reusing memory from another similar request (e.g. caching HTTP Request MIME headers)
I'm only talking about indexing m[string(b)] in this bug, for Go->runtime maps.
For Go->Go, that is issue #2205 (cmd/gc: read-only escape analysis and avoiding string ->
[]byte copies)
Member

bradfitz commented Mar 31, 2013

Comment 7:

This bug has nothing to do with interning except that if I were to implement interning
myself in pure Go, without the help of the runtime (which is issue #5160), then I'd want
to lookup interned strings from []byte from the network, without making copies.
That is, I'd have:
  var internTable map[string]string  // identity map. from key to same value.
But I'd want to access it like:
   var b []byte = slice of bytes from the network in a bytes.Buffer
   _ = internTable[string(b)]
... reusing memory from another similar request (e.g. caching HTTP Request MIME headers)
I'm only talking about indexing m[string(b)] in this bug, for Go->runtime maps.
For Go->Go, that is issue #2205 (cmd/gc: read-only escape analysis and avoiding string ->
[]byte copies)
@dvyukov

This comment has been minimized.

Show comment Hide comment
@dvyukov

dvyukov Mar 31, 2013

Member

Comment 8:

Ah, I see, so you want on intern []byte and get string as the result.
Yes, I think creating a temp string from []byte in this context should be safe even in
complex statements like:
m[f(...)] = g(h(...), k(intern(b)))
Member

dvyukov commented Mar 31, 2013

Comment 8:

Ah, I see, so you want on intern []byte and get string as the result.
Yes, I think creating a temp string from []byte in this context should be safe even in
complex statements like:
m[f(...)] = g(h(...), k(intern(b)))
@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Mar 31, 2013

Member

Comment 9:

Please, ignore everything about interning.  That was just one example of a user of this.
I think you're making this sound more complicated than it is.  And I think one of us is
using the wrong vocabulary.  (I don't want to "intern []byte" because I can't intern a
mutable thing)
I just want m[string(byteSlice)] to avoid a copy and to instead create a fake String
header temporary.  It never escapes the hashmap functions, so it's safe.
And no language semantics change which weren't already racy and undefined.
Example:
m := map[string]int{
   "X": 1,
   "Y": 2,
}
mutate := func(b []byte) { b[0] = 'Y' }
key := []byte{'X'}
go mutate(key)
value, ok := m[string(key)]
What is value & ok?
  1, true?
  2, true?
  0, false?
It's already undefined.
Member

bradfitz commented Mar 31, 2013

Comment 9:

Please, ignore everything about interning.  That was just one example of a user of this.
I think you're making this sound more complicated than it is.  And I think one of us is
using the wrong vocabulary.  (I don't want to "intern []byte" because I can't intern a
mutable thing)
I just want m[string(byteSlice)] to avoid a copy and to instead create a fake String
header temporary.  It never escapes the hashmap functions, so it's safe.
And no language semantics change which weren't already racy and undefined.
Example:
m := map[string]int{
   "X": 1,
   "Y": 2,
}
mutate := func(b []byte) { b[0] = 'Y' }
key := []byte{'X'}
go mutate(key)
value, ok := m[string(key)]
What is value & ok?
  1, true?
  2, true?
  0, false?
It's already undefined.
@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Apr 1, 2013

Member

Comment 10:

Patch at https://golang.org/cl/8090046

Owner changed to ---.

Member

bradfitz commented Apr 1, 2013

Comment 10:

Patch at https://golang.org/cl/8090046

Owner changed to ---.

@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Apr 26, 2013

Member

Comment 11:

Labels changed: added performance, garbage.

Member

bradfitz commented Apr 26, 2013

Comment 11:

Labels changed: added performance, garbage.

@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Nov 27, 2013

Contributor

Comment 12:

Labels changed: added go1.3maybe.

Contributor

rsc commented Nov 27, 2013

Comment 12:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 13:

Labels changed: added release-none, removed go1.3maybe.

Contributor

rsc commented Dec 4, 2013

Comment 13:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 14:

Labels changed: added repo-main.

Contributor

rsc commented Dec 4, 2013

Comment 14:

Labels changed: added repo-main.

@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Dec 5, 2013

Member

Comment 15:

I still want this, and it would improve Camlistore start-up time when we slurp a big
on-disk index into memory.
Is this definitely off the table for Go 1.3?
Member

bradfitz commented Dec 5, 2013

Comment 15:

I still want this, and it would improve Camlistore start-up time when we slurp a big
on-disk index into memory.
Is this definitely off the table for Go 1.3?
@ianlancetaylor

This comment has been minimized.

Show comment Hide comment
@ianlancetaylor

ianlancetaylor Dec 5, 2013

Contributor

Comment 16:

This can be seen as a compiler optimization, a specialized form of escape analysis. 
Given a []byte b, a map lookup of string(b) can use the bytes directly without making a
copy.  In general, we can do this optimization for any function call where the compiler
knows the complete code of the function (the function does not make any calls to unknown
functions) and where the string does not escape the function (this is not something we
normally check for) and where the function does not execute any synchronization
operations (also something we do not normally check for, but given the lack of function
calls this means no channel operations).  Map lookup is a special case of a function
call where these attributes are known.
We could also view it as a language extension, but I don't think that is a good idea. 
It would be similar to the way copy/append accept string as well as []byte, but we could
only permit it for a map lookup, not for a map insertion.
I agree that this optimization doesn't introduce any races that were not already
present, but we still need to consider what should happen for a racy program.  Right now
a racy program will yield an unpredictable string, but the code to convert from []byte
to string is quite simple and will not crash or introduce memory corruption.  We would
need to preserve that property.  Looking at the strhash and strequal functions, I think
we would be OK.
So I don't see any strong objections to this optimization, though of course somebody
would have to actually do the work.

Labels changed: removed priority-later.

Status changed to Accepted.

Contributor

ianlancetaylor commented Dec 5, 2013

Comment 16:

This can be seen as a compiler optimization, a specialized form of escape analysis. 
Given a []byte b, a map lookup of string(b) can use the bytes directly without making a
copy.  In general, we can do this optimization for any function call where the compiler
knows the complete code of the function (the function does not make any calls to unknown
functions) and where the string does not escape the function (this is not something we
normally check for) and where the function does not execute any synchronization
operations (also something we do not normally check for, but given the lack of function
calls this means no channel operations).  Map lookup is a special case of a function
call where these attributes are known.
We could also view it as a language extension, but I don't think that is a good idea. 
It would be similar to the way copy/append accept string as well as []byte, but we could
only permit it for a map lookup, not for a map insertion.
I agree that this optimization doesn't introduce any races that were not already
present, but we still need to consider what should happen for a racy program.  Right now
a racy program will yield an unpredictable string, but the code to convert from []byte
to string is quite simple and will not crash or introduce memory corruption.  We would
need to preserve that property.  Looking at the strhash and strequal functions, I think
we would be OK.
So I don't see any strong objections to this optimization, though of course somebody
would have to actually do the work.

Labels changed: removed priority-later.

Status changed to Accepted.

@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Dec 5, 2013

Member

Comment 17:

Thanks for the reply.
> So I don't see any strong objections to this optimization, 
Russ has previously argued that performance optimizations are de-facto language changes
that people come to rely on and must then be implemented widely.  I argue App Engine's
lack of unsafe + gccgo vs gc already introduce three big worlds of differing performance
characteristics.
> though of course somebody would have to actually do the work.
I did it earlier. See comment #10 above. I barely close to no feedback on it, though.
What I'd like is a decision that we can actually do this.
I very much don't want a language change.  In my CL above, I just looked for
m[string(b)] and made a fake stringheader from b.
Member

bradfitz commented Dec 5, 2013

Comment 17:

Thanks for the reply.
> So I don't see any strong objections to this optimization, 
Russ has previously argued that performance optimizations are de-facto language changes
that people come to rely on and must then be implemented widely.  I argue App Engine's
lack of unsafe + gccgo vs gc already introduce three big worlds of differing performance
characteristics.
> though of course somebody would have to actually do the work.
I did it earlier. See comment #10 above. I barely close to no feedback on it, though.
What I'd like is a decision that we can actually do this.
I very much don't want a language change.  In my CL above, I just looked for
m[string(b)] and made a fake stringheader from b.
@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Dec 18, 2013

Member

Comment 18:

Ping. Can we do this?
Can I get a decision about whether this is acceptable for Go 1.3?
Member

bradfitz commented Dec 18, 2013

Comment 18:

Ping. Can we do this?
Can I get a decision about whether this is acceptable for Go 1.3?
@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Jan 10, 2014

Member

Comment 19:

Ping.
Is this thing on?
Member

bradfitz commented Jan 10, 2014

Comment 19:

Ping.
Is this thing on?
@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Jan 22, 2014

Contributor

Comment 20:

In September, I wrote back to a mail thread with you (Brad) and Rob (on 9/23/2013;
subject "m[string(byteSlice))] CL"):
---
I have thought a bit more about this. I believe that we could do it more generally, but
that we should not do it just for this special case.
Specifically, if we special case x = m[string(b)], then 
s := string(b)
x = m[s]
y = m2[s]
will actually be slower than
x = m[string(b)]
y = m2[string(b)]
and that bothers me quite a bit. However, it should be possible to do something a little
like escape analysis to see that between the definition of s and its uses, there are no
synchronization points and no assignments that might legitimately allow s to contain
different bytes from b, at which point the optimization would apply. If we did something
like that, it would make things faster while still keeping performance a bit more
predictable.
I don't know when this would happen. Maybe for Go 1.3.
---
You asked why it bothered me and I wrote:
---
because pulling repeated computed subexpressions into a variable should be faster than
not.
i don't know of any case today where that's not true.
---
I don't think that's "barely close to no feedback". My position is still what it was in
September. 
I apologize for not seeing this issue until today. I did not see this issue due to
over-aggressive mail filtering, which I have fixed[*]. I went looking for the issue
because Ian asked me about it off-issue. (So no, this thing was not on.)
I will look into how hard it is to do the analysis I wrote about in September, but I
can't make any promises.
[*] Because I was auto-CC'ed on every bug, I filtered them all away; I've removed my
auto-CC and am now on golang-bugs, which I filter away, but if I'm explicitly CC'ed on a
bug it will land in my inbox.
Contributor

rsc commented Jan 22, 2014

Comment 20:

In September, I wrote back to a mail thread with you (Brad) and Rob (on 9/23/2013;
subject "m[string(byteSlice))] CL"):
---
I have thought a bit more about this. I believe that we could do it more generally, but
that we should not do it just for this special case.
Specifically, if we special case x = m[string(b)], then 
s := string(b)
x = m[s]
y = m2[s]
will actually be slower than
x = m[string(b)]
y = m2[string(b)]
and that bothers me quite a bit. However, it should be possible to do something a little
like escape analysis to see that between the definition of s and its uses, there are no
synchronization points and no assignments that might legitimately allow s to contain
different bytes from b, at which point the optimization would apply. If we did something
like that, it would make things faster while still keeping performance a bit more
predictable.
I don't know when this would happen. Maybe for Go 1.3.
---
You asked why it bothered me and I wrote:
---
because pulling repeated computed subexpressions into a variable should be faster than
not.
i don't know of any case today where that's not true.
---
I don't think that's "barely close to no feedback". My position is still what it was in
September. 
I apologize for not seeing this issue until today. I did not see this issue due to
over-aggressive mail filtering, which I have fixed[*]. I went looking for the issue
because Ian asked me about it off-issue. (So no, this thing was not on.)
I will look into how hard it is to do the analysis I wrote about in September, but I
can't make any promises.
[*] Because I was auto-CC'ed on every bug, I filtered them all away; I've removed my
auto-CC and am now on golang-bugs, which I filter away, but if I'm explicitly CC'ed on a
bug it will land in my inbox.
@rsc

This comment has been minimized.

Show comment Hide comment
@rsc

rsc Apr 3, 2014

Contributor

Comment 21:

This issue was closed by revision f5f5a8b.

Status changed to Fixed.

Contributor

rsc commented Apr 3, 2014

Comment 21:

This issue was closed by revision f5f5a8b.

Status changed to Fixed.

@bradfitz

This comment has been minimized.

Show comment Hide comment
@bradfitz

bradfitz Apr 7, 2014

Member

Comment 22:

This issue was updated by revision 8072f46.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews, iant
https://golang.org/cl/84230043
Member

bradfitz commented Apr 7, 2014

Comment 22:

This issue was updated by revision 8072f46.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews, iant
https://golang.org/cl/84230043

@gopherbot gopherbot locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.