Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: simplify multiplication and division of Abs() #50126

Open
Jacalz opened this issue Dec 13, 2021 · 2 comments
Open

cmd/compile: simplify multiplication and division of Abs() #50126

Jacalz opened this issue Dec 13, 2021 · 2 comments

Comments

@Jacalz
Copy link
Contributor

@Jacalz Jacalz commented Dec 13, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.8 linux/amd64

Does this issue reproduce with the latest release?

Yes. See https://go.godbolt.org/z/1Gq4bPPnd for assembly output.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jacob/.cache/go-build"
GOENV="/home/jacob/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/jacob/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jacob/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/golang"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.8"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3228184624=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I noticed that the compile doesn't optimize calls to math.Abs() or cmplx.Abs() to strip out unnecessary calls to those functions when multiplying them or dividing them. According to the absolute value rules, |A * B| = |A| * |B| and |A / B| = |A| / |B|, meaning that the compiler should be able to work the right two calls to either math.Abs() or cmplx.Abs() into a single call by moving the multiplication or division inside the call instead.

This means that the first block of assembly can be simplified to the one after.

        MOVQ    X0, AX
        BTRQ    $63, AX
        MOVQ    AX, X0
        MOVQ    X1, AX
        BTRQ    $63, AX
        MOVQ    AX, X1
        MULSD   X1, X0
        MULSD   X1, X0
        MOVQ    X0, AX
        BTRQ    $63, AX
        MOVQ    AX, X0

The above assembly was taken from the multiplication examples below:

func AbsMult(x, y float64) float64 {
    return math.Abs(x) * math.Abs(y)
}

func AbsMultSimple(x, y float64) float64 {
    return math.Abs(x * y)
}

func AbsDiv(x, y float64) float64 {
    return math.Abs(x) / math.Abs(y)
}

func AbsDivSimple(x, y float64) float64 {
    return math.Abs(x / y)
}

This should also apply to complex numbers. It is important to note that it is only for multiplication and division, not addition and subtraction.

What did you expect to see?

I expected the compiler to strip out the duplicated calls to math.Abs(). Thus getting the following assembly for the AbsMult function below.

        MULSD   X1, X0
        MOVQ    X0, AX
        BTRQ    $63, AX
        MOVQ    AX, X0

What did you see instead?

The compiler did not strip it out:

        MOVQ    X0, AX
        BTRQ    $63, AX
        MOVQ    AX, X0
        MOVQ    X1, AX
        BTRQ    $63, AX
        MOVQ    AX, X1
        MULSD   X1, X0

For more info, see: dominikh/go-tools#1135

@ALTree
Copy link
Member

@ALTree ALTree commented Dec 13, 2021

The optimization may be correct, but is it worthwhile to hardcode it in the compiler? You can just write math.Abs(x * y), after all.

I can understand special-casing arithmetic simplifications for built-in operators (like + or *), but I'm not sure about math functions.

Even gcc doesn't do this on -O2:

#include <math.h>
double f1(double i, double j) { return fabs(i)*fabs(j); }
double f2(double i, double j) { return fabs(i*j); }
f1:
        movq    xmm2, QWORD PTR .LC0[rip]
        andpd   xmm0, xmm2
        andpd   xmm1, xmm2
        mulsd   xmm0, xmm1
        ret
f2:
        mulsd   xmm0, xmm1
        andpd   xmm0, XMMWORD PTR .LC0[rip]
        ret

Clang, on the other hand, simplifies it, but still...

@Jacalz
Copy link
Contributor Author

@Jacalz Jacalz commented Dec 13, 2021

Indeed. That is a good question that also was brought up in the issue that I opened for staticcheck. I figured that it was better to open an issue here and let you all decide on that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants