Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
cmd/internal/obj/x86: add fma #8037
What does 'go version' print? go version devel +5b9ac653acf6 Mon May 19 22:57:59 2014 -0400 darwin/amd64 What steps reproduce the problem? If possible, include a link to a program on play.golang.org. 1. Go to $GOROOT/src/cmd/6l/6.out.h 2. According to http://golang.org/doc/asm#architectures , this file contains a list of all the assembler instructions Go recognizes. 3. No fused-multiply-add instructions are in the list. Please provide any additional information below. Using FMA allows for several faster and more accurate versions of several algorithms. My current main use case is polynomial evaluation (Horner's scheme and Estrin's scheme are both built on FMA), but another very important one is matrix multiplication. Adding at least a subset of the FMA instructions available on newer AMD and Intel CPUs (available on PowerPC since at least the 600 series) would improve the ability of performance minded coders and compiler optimization writers to make faster running code. For instance, a more accurate version of the update in the Mandelbrot set is: if the starting point is Cr + i*Ci (and halfCi = 0.5 * Ci) Zr, Zi = fma( Zr - Zi, Zr + Zi, Cr ), 2 * fma( Zr, Zi, halfCi ) The update to Zr is likely slower than what is presently done for the The Computer Language Benchmarks Game, but an fma based update to Zi would definitely speed things up. Granted, the speedup would only materialize when issue #4978 ( https://golang.org/issue/4978 ) is resolved, but having code in useful libraries in place to take advantage of that optimization would be nice instead of having to recode everything afterwords. To that end, I would request exposing at least the following (would do it myself if I could find an understandable listing of the opcodes and documentation on how to add them to the compiler): For greatest compatibility with AMD chips, 4 operand forms ( from http://en.wikipedia.org/wiki/X86_instruction_listings#FMA_instructions ): VFMADDPD, VFMADDPS, VFMADDSD, VFMADDSS. And for Intel chips, the 3 operand forms ( from Vol. 1 14-21 of Intel's developer's manual: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals ): VFMADD231PD, VFMADD231PS, VFMADD231SD, VFMADD231SS All of the other operations are permutations of minus signs applied to these (and which register gets clobbered for the 3 op version - the one I'm requesting is the one that works best with polynomial evaluation, naturally). Solving this issue would also make issue #681 more straightforward ( https://golang.org/issue/681 ), although even having the opcodes would be enough for that one.
Were the compiler to add and begin to support this feature, the Daxpy function in https://github.com/gonum/blas/blob/master/goblas/level1double.go is a good place to start.
pushed a commit
Jan 24, 2016
referenced this issue
Aug 8, 2017
Implemented in https://go-review.googlesource.com/#/c/go/+/75490/.