New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: unexpected performance difference accessing slices with different caps #27857

Open
iand opened this Issue Sep 25, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@iand
Contributor

iand commented Sep 25, 2018

Using tip (go version devel +b794ca64d2 Mon Sep 3 07:14:25 2018 +0000 linux/amd64)

While implementing some bounds check optimizations in image/draw CL 136935 I came across an unexpected performance difference between accessing elements of a slice s created like this

s := spix[i : i+4 : len(spix)]

and

s := spix[i : i+4 : i+4]

Both forms eliminate bound checks for accessing elements 0-3 of s but the second form has a significant performance improvement over the first (see CL for benchmarks).

Building with go build -gcflags="-d=ssa/check_bce/debug=1" shows no difference in bounds checks between the two forms so presumably it has something to do with the resulting size of the slice.

Attached is a simple program and disassembly derived from the image/draw code that demonstrates the effect. You can see that the assembly generated for the second form is quite a bit shorter.

bounds.go.txt
bounds.asm.txt

@randall77

This comment has been minimized.

Contributor

randall77 commented Sep 25, 2018

Go has a special case when slicing and generating the empty slice.
This has the potential to generate a pointer to the next object in memory, like this:

x := make([]int, 4)
x = x[4:]

The resulting x has capacity 0, and if we just computed its base pointer as x.ptr += 4*sizeof(int) then x.ptr would point to the next object in memory after the [4]int that was allocated. This is bad because it could mistakenly retain a random object in the heap.

So when slicing, we need to somehow avoid contructing such a pointer. Currently we do this by avoiding the add when the capacity is 0. So we do x.ptr += newcap == 0 ? 0 : 4*sizeof(int). And we do it without a conditional, using shift and mask tricks.

When you do spix[i:i+4:i+4] the new capacity is 4, and the compiler knows that. So the conditional in the expression above constant folds, and we just get x.ptr += i*sizeof(int). When you do spix[i:i+4:len(spix)], it's not obvious whether the resulting slice has capacity 0 or not, so the arithmetic is still all there.

The optimization that would fix this is that if the resulting slice is known to have nonzero length (we know that here because i+4-i > 0), we don't need to guard against next object pointers.

@bcmills bcmills added the Performance label Sep 25, 2018

@bcmills bcmills added this to the Unplanned milestone Sep 25, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Sep 25, 2018

Change https://golang.org/cl/136935 mentions this issue: image/draw: optimize bounds checks in loops

@randall77

This comment has been minimized.

Contributor

randall77 commented Sep 25, 2018

s/the new capacity is 0/the new capacity is 4/
s/x.ptr += 0/x.ptr += i*sizeof(int)/

@randall77 randall77 added the NeedsFix label Sep 25, 2018

gopherbot pushed a commit that referenced this issue Sep 25, 2018

image/draw: optimize bounds checks in loops
Use subslices with known length and cap to give bounds checking hints
to the compiler. Improves over the earlier pointer based optimizations
in https://go-review.googlesource.com/c/go/+/14093 for GlyphOver but
not for FillOver so the latter is left unchanged.

See #27857 for discussion of small caps used in subslices.

name               old time/op  new time/op  delta
FillOver-8          607µs ± 1%   609µs ± 1%     ~     (p=0.447 n=9+10)
FillSrc-8          23.0µs ± 1%  22.9µs ± 2%     ~     (p=0.412 n=9+10)
CopyOver-8          647µs ± 0%   560µs ± 0%  -13.43%  (p=0.000 n=9+10)
CopySrc-8          19.3µs ± 1%  19.1µs ± 2%   -0.66%  (p=0.029 n=10+10)
NRGBAOver-8         697µs ± 1%   651µs ± 1%   -6.64%  (p=0.000 n=10+10)
NRGBASrc-8          405µs ± 1%   347µs ± 0%  -14.23%  (p=0.000 n=10+10)
YCbCr-8             432µs ± 2%   431µs ± 1%     ~     (p=0.764 n=10+9)
Gray-8              164µs ± 1%   139µs ± 1%  -15.44%  (p=0.000 n=10+10)
CMYK-8              498µs ± 0%   461µs ± 0%   -7.49%  (p=0.000 n=10+9)
GlyphOver-8         220µs ± 0%   199µs ± 0%   -9.52%  (p=0.000 n=9+10)
RGBA-8             3.81ms ± 5%  3.79ms ± 5%     ~     (p=0.549 n=9+10)
Paletted-8         1.73ms ± 0%  1.73ms ± 1%     ~     (p=0.278 n=10+9)
GenericOver-8      11.0ms ± 2%  11.0ms ± 1%     ~     (p=0.842 n=9+10)
GenericMaskOver-8  5.29ms ± 1%  5.30ms ± 0%     ~     (p=0.182 n=9+10)
GenericSrc-8       4.24ms ± 1%  4.24ms ± 0%     ~     (p=0.436 n=9+9)
GenericMaskSrc-8   7.89ms ± 1%  7.90ms ± 2%     ~     (p=0.631 n=10+10)

Change-Id: I6fe1b21bb5e255826cbfdd2e73efd5858cd5557c
Reviewed-on: https://go-review.googlesource.com/136935
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot

This comment has been minimized.

gopherbot commented Sep 25, 2018

Change https://golang.org/cl/137495 mentions this issue: image: optimize bounds checking for At and Set methods

gopherbot pushed a commit that referenced this issue Oct 1, 2018

image: optimize bounds checking for At and Set methods
Use a subslice of the pixel data to give the compiler hints
for bounds checking. Only do this for image formats that
require 4 or more slice reads/writes.

See #27857 for discussion of small cap sizes.

name                   old time/op    new time/op    delta
At/rgba-8              18.8ns ± 2%    18.5ns ± 1%   -1.49%  (p=0.026 n=10+10)
At/rgba64-8            22.2ns ± 2%    21.1ns ± 3%   -4.51%  (p=0.000 n=10+10)
At/nrgba-8             18.8ns ± 2%    18.7ns ± 2%     ~     (p=0.467 n=10+10)
At/nrgba64-8           21.9ns ± 2%    21.0ns ± 2%   -4.15%  (p=0.000 n=10+9)
At/alpha-8             14.3ns ± 1%    14.3ns ± 2%     ~     (p=0.543 n=10+10)
At/alpha16-8           6.43ns ± 1%    6.47ns ± 1%     ~     (p=0.053 n=10+10)
At/gray-8              14.4ns ± 2%    14.6ns ± 5%     ~     (p=0.194 n=10+10)
At/gray16-8            6.52ns ± 1%    6.55ns ± 2%     ~     (p=0.610 n=10+10)
At/paletted-8          4.17ns ± 1%    4.21ns ± 2%     ~     (p=0.095 n=9+10)
Set/rgba-8             39.2ns ± 2%    40.1ns ± 4%   +2.45%  (p=0.007 n=10+10)
Set/rgba64-8           46.2ns ± 3%    43.3ns ± 3%   -6.11%  (p=0.000 n=10+10)
Set/nrgba-8            39.2ns ± 1%    39.7ns ± 5%     ~     (p=0.407 n=10+10)
Set/nrgba64-8          45.6ns ± 3%    42.9ns ± 3%   -5.83%  (p=0.000 n=9+10)
Set/alpha-8            35.0ns ± 3%    34.1ns ± 2%   -2.43%  (p=0.017 n=10+10)
Set/alpha16-8          36.3ns ± 4%    35.8ns ± 5%     ~     (p=0.254 n=10+10)
Set/gray-8             19.8ns ± 1%    19.7ns ± 0%   -0.69%  (p=0.002 n=8+6)
Set/gray16-8           36.0ns ± 1%    36.4ns ± 2%   +1.08%  (p=0.037 n=10+10)
Set/paletted-8         39.1ns ± 0%    39.6ns ± 1%   +1.30%  (p=0.000 n=10+10)
RGBAAt-8               3.72ns ± 1%    3.58ns ± 1%   -3.76%  (p=0.000 n=9+10)
RGBASetRGBA-8          4.35ns ± 1%    3.70ns ± 1%  -14.92%  (p=0.000 n=10+10)
RGBA64At-8             5.08ns ± 1%    3.69ns ± 1%  -27.40%  (p=0.000 n=9+9)
RGBA64SetRGBA64-8      6.65ns ± 2%    3.63ns ± 0%  -45.35%  (p=0.000 n=10+9)
NRGBAAt-8              3.72ns ± 1%    3.59ns ± 1%   -3.55%  (p=0.000 n=10+10)
NRGBASetNRGBA-8        4.05ns ± 0%    3.71ns ± 1%   -8.57%  (p=0.000 n=9+10)
NRGBA64At-8            4.99ns ± 1%    3.69ns ± 0%  -26.07%  (p=0.000 n=10+9)
NRGBA64SetNRGBA64-8    6.66ns ± 1%    3.64ns ± 1%  -45.40%  (p=0.000 n=10+10)
AlphaAt-8              1.44ns ± 1%    1.44ns ± 0%     ~     (p=0.176 n=10+7)
AlphaSetAlpha-8        1.60ns ± 2%    1.56ns ± 0%   -2.62%  (p=0.000 n=10+6)
Alpha16At-8            2.87ns ± 1%    2.92ns ± 2%   +1.67%  (p=0.001 n=10+10)
AlphaSetAlpha16-8      3.26ns ± 1%    3.35ns ± 1%   +2.68%  (p=0.012 n=8+3)

Fixes #14884

Change-Id: Ia0383530596a550e1b1c7aafce5220e5e0935a53
Reviewed-on: https://go-review.googlesource.com/137495
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment