Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/internal/obj/x86: add fma #8037

Closed
gopherbot opened this issue May 20, 2014 · 11 comments
Closed

cmd/internal/obj/x86: add fma #8037

gopherbot opened this issue May 20, 2014 · 11 comments
Labels
FeatureRequest FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance Thinking
Milestone

Comments

@gopherbot
Copy link

gopherbot commented May 20, 2014

by odysseus9672:

What does 'go version' print?
go version devel +5b9ac653acf6 Mon May 19 22:57:59 2014 -0400 darwin/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

1. Go to $GOROOT/src/cmd/6l/6.out.h
2. According to http://golang.org/doc/asm#architectures , this file contains a list of
all the assembler instructions Go recognizes.
3. No fused-multiply-add instructions are in the list.

Please provide any additional information below.

Using FMA allows for several faster and more accurate versions of several algorithms. My
current main use case is polynomial evaluation (Horner's scheme and Estrin's scheme are
both built on FMA), but another very important one is matrix multiplication. Adding at
least a subset of the FMA instructions available on newer AMD and Intel CPUs (available
on PowerPC since at least the 600 series) would improve the ability of performance
minded coders and compiler optimization writers to make faster running code. 

For instance, a more accurate version of the update in the Mandelbrot set is:
if the starting point is Cr + i*Ci (and halfCi = 0.5 * Ci)
Zr, Zi = fma( Zr - Zi, Zr + Zi, Cr ), 2 * fma( Zr, Zi, halfCi )

The update to Zr is likely slower than what is presently done for the The Computer
Language Benchmarks Game, but an fma based update to Zi would definitely speed things up.

Granted, the speedup would only materialize when issue #4978 (
https://golang.org/issue/4978 ) is resolved, but having code in
useful libraries in place to take advantage of that optimization would be nice instead
of having to recode everything afterwords.

To that end, I would request exposing at least the following (would do it myself if I
could find an understandable listing of the opcodes and documentation on how to add them
to the compiler): 

For greatest compatibility with AMD chips, 4 operand forms ( from
http://en.wikipedia.org/wiki/X86_instruction_listings#FMA_instructions ): 
VFMADDPD, VFMADDPS, VFMADDSD, VFMADDSS.

And for Intel chips, the 3 operand forms ( from Vol. 1 14-21 of Intel's developer's
manual:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals
):
VFMADD231PD, VFMADD231PS, VFMADD231SD, VFMADD231SS

All of the other operations are permutations of minus signs applied to these (and which
register gets clobbered for the 3 op version - the one I'm requesting is the one that
works best with polynomial evaluation, naturally). 

Solving this issue would also make issue #681 more straightforward (
https://golang.org/issue/681 ), although even having the opcodes
would be enough for that one.
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented May 20, 2014

Comment 1:

Labels changed: added repo-main, release-go1.4.

@btracey
Copy link
Contributor

btracey commented Jun 11, 2014

Comment 2:

Were the compiler to add and begin to support this feature, the Daxpy function in
https://github.com/gonum/blas/blob/master/goblas/level1double.go is a good place to
start.

@rsc
Copy link
Contributor

rsc commented Sep 15, 2014

Comment 3:

Labels changed: added release-go1.5, removed release-go1.4.

Status changed to Accepted.

@bradfitz bradfitz modified the milestone: Go1.5 Dec 16, 2014
@rsc rsc removed accepted labels Apr 14, 2015
@rsc rsc modified the milestones: Unplanned, Go1.5 Apr 26, 2015
@rsc rsc changed the title cmd/6a: Assembler Lacks FMA Instructions (Feature Request) cmd/internal/obj/x86: add fma Jun 8, 2015
@gopherbot
Copy link
Author

gopherbot commented Jan 22, 2016

CL https://golang.org/cl/18850 mentions this issue.

gopherbot pushed a commit that referenced this issue Jan 24, 2016
Generated by x86test, from https://golang.org/cl/18842
(still in progress).

The commented out lines are either missing or misspelled
or incorrectly handled instructions.

For #4816, #8037, #13822, #14068, #14069.

Change-Id: If309310c97d9d2a3c71fc64c51d4a957e9076ab7
Reviewed-on: https://go-review.googlesource.com/18850
Reviewed-by: Rob Pike <r@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@odysseus9672
Copy link

odysseus9672 commented Oct 5, 2016

Is there documentation for how to add commands to the assembler?

This documentation says that the process is straightforward, and even describes how to use unsupported instructions with known opcodes.

@randall77
Copy link
Contributor

randall77 commented Oct 5, 2016

@odysseus9672 I don't think there is any documentation, but you can follow other CLs that added instructions (e.g. https://go-review.googlesource.com/c/14127/). git blame will give you a more comprehensive list.

@bradfitz bradfitz added FeatureRequest help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Aug 8, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Unplanned Aug 8, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Aug 8, 2017

@randall77, planning on doing this for Go 1.10?

@josharian
Copy link
Contributor

josharian commented Aug 8, 2017

If the compiler is to emit these, I think this would require a GOAMD64, since FMA is not part of the minimum supported amd64 instruction set: #19593

@bradfitz bradfitz modified the milestones: Go1.10, Unplanned Nov 15, 2017
@quasilyte
Copy link
Contributor

quasilyte commented Dec 26, 2017

And for Intel chips, the 3 operand forms ( from Vol. 1 14-21 of Intel's developer's
manual:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals
):
VFMADD231PD, VFMADD231PS, VFMADD231SD, VFMADD231SS

Implemented in https://go-review.googlesource.com/#/c/go/+/75490/.
Included in Go1.10.

@agnivade
Copy link
Contributor

agnivade commented Nov 8, 2019

@randall77 - Given that we have an FMA function now, should this be closed ? Automatically generating FMA instructions might need to bump our minimum architecture set.

@randall77
Copy link
Contributor

randall77 commented Nov 8, 2019

Yes, I think this is just about the assembly instructions, which were added in CL 75490.

We do already generate these special assembly instructions for math.FMA on AMD64. They are guarded by a runtime cpu feature detection test.

@golang golang locked and limited conversation to collaborators Nov 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance Thinking
Projects
None yet
Development

No branches or pull requests

10 participants