Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Append consecutive slices, same underlying array #17530

Closed
npat-efault opened this issue Oct 20, 2016 · 17 comments

Comments

Projects
None yet
7 participants
@npat-efault
Copy link
Contributor

commented Oct 20, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.7.3 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/npat"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build296609692=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

What did you do?

It seems that the built-in append does not check for a situation like the following (appending consecutive slices, with the same underlying array) , and needlessly copies data, when it could avoid it.

  func appendBytes(a, b []byte) []byte {
          if len(b) == 0 {
                  return a
          }
          // Shouldn't append() check for something like this?
          if cap(a) >= len(a)+len(b) && &a[:len(a)+1][len(a)] == &b[0] {
                  return a[:len(a)+len(b)]
          }
          return append(a, b...)
  }

If I'm following the breadcrumbs correctly, they lead to the fact that runtime-memmove() does not short-circuit when source and destination addresses are the same (I may be wrong, though).

Running the simple benchmarks at:

https://gist.github.com/npat-efault/f055654633f43d0ef771d38657fd499e

for any value of N, demonstrates this.

What did you expect to see?

Built-in append detecting the trivial case and behaving (performance-wise) like appendBytes above, running time being independent of slice b size

What did you see instead?

Running time increases with slice b size.

@cznic

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

I think this is a rarely occurring optimization opportunity. That said, the cost for its detection, paid in all other, more often seen cases, is possibly high.

Any real world examples which would be helped by this?

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2016

I believe that the cost of runtime-memmove() detecting that source and destination addresses are the same, is minimal (one or two instructions). Almost barely noticeable and only for extremely small slices.

Then again, you are right, it is rarely occurring.

@cznic

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

I doubt cap(a) >= len(a)+len(b) && &a[:len(a)+1][len(a)] == &b[0] can compile to one or two instructions.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

I agree that this seems super rare and not worth the code to check for it.
If you know you're in this situation (or might be), you can check yourself and reslice to avoid the copy.
Closing.

@randall77 randall77 closed this Oct 20, 2016

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2016

@cznic: I don't think you have to do exactly this. This is purely for demonstration purposes (the best I could do with "user code"). All you'd have to do is, in fact, add a CMPQ DI, SI; JEQ move_0 (or something similar, I'm not familiar with the go assembler) to the runtime-memmove() implementation.

If I'm not mistaken.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

Right, memmove could do a pretty simple check. But we're not going to add that absent some real code which cares.

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2016

I was thinking of code along the lines of this:

    parsed := src[0:0]
    for len(src) > 0 {
        n := keep(src)
        parsed = append(parsed, src[:n]...)
        src = src[n:]
        n = drop(src)
        src = src[n:]
    }

Arguably it may not be the best way to code something like this, but it is not totally unnatural. In this case, if drop() returns 0, you should not have to pay for the extra copy...

Anyway, as you said, it can be avoided.

I just thought I'd point-out the issue (and the fact that it can easily be fixed). Whether something should be done about it, is a judgement call, and is up to you...

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2016

Fixed a couple of typos in the example code of my previous comment

@nigeltao

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

You could write it like this:

// r and w are read and write cursors for src.
r, w := 0, 0
for r < len(src) {
    n := keep(src[r:])
    if r != w {
        copy(src[w:], src[r:r+n])
    }
    r, w = r+n, w+n
    r += drop(src[r:])
}
parsed := src[:w]
@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2016

@nigeltao Yes, of course I could. And there are probably other ways, as well. The thing is, the code I posted was the first thing that came to my mind (instinctively). And immediately I though: "I guess append() will be clever enough to avoid the copy and just do a re-slice (extend)". I checked, and it turns out it isn't. Which was a bit surprising. Especially considering that the cost of detecting the (uncommon) special case turns out to be negligible. I think (and this is a judgement call) that doing "the right thing" in this case justifies the (almost negligible) cost.

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 20, 2016

@nigeltao Also, in my original case, parsed was passed as an argument, and it could, or could not, "coincide" with src. Yes you could also handle this with some extra code, but this is besides the point... (it's the "asymmetry" that matters)

@mikioh mikioh changed the title Append consecutive slices, same underlying array runtime: Append consecutive slices, same underlying array Oct 22, 2016

@sprstnd

This comment has been minimized.

Copy link

commented Oct 23, 2016

so redundant compiler/runtime checks are fine, but this extra check is a concern? poppycock!

@minux

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

Whether the cost is negligible or not is the deciding factor
for adding a special case.

If the case only happens one in a million times, then no
matter how cheap the optimization is, we shouldn't add
it.

And the optimization cost is non-negligible anyway as
it will at least cost one more entry in the branch predictor
tables.

That being said, I instrumented the memmove routine
and found that runtime.typedmemmove actually does
trigger runtime.memmove with src == dest. Perhaps it's
worthwhile to investigate whether we can eliminate such
unnecessary memmove calls.

@minux

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 25, 2016

Ah, I see! The same memmove is used in many cases, other than slice-copy, slice-append, and the likes. A lot of them may be very small-size moves, where the cost of adding the check could be considerable (or maybe not). We could consider adding the check only for slice appends (which is more complicated that adding a simple compare to memmove). In this case, though, it would mean adding a couple of extra instructions at every append call site... not an obvious gain.

@npat-efault

This comment has been minimized.

Copy link
Contributor Author

commented Oct 25, 2016

What if we add the test to memmove for all but the smallest sizes? Just a thought.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2016

Adding the test above a size threshold would certainly bound the overhead
of the check.

On Tue, Oct 25, 2016 at 12:53 PM, Nick Patavalis notifications@github.com
wrote:

What if we add the test to memmove for all but the smallest sizes? Just a
thought.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#17530 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGkgIFHP3HmXrnZ-X2DxuPUP4y6Y0pLCks5q3l4kgaJpZM4KcYDI
.

@golang golang locked and limited conversation to collaborators Oct 25, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.