Skip to content

cmd/internal/obj/x86: add fma #8037

@gopherbot

Description

@gopherbot

by odysseus9672:

What does 'go version' print?
go version devel +5b9ac653acf6 Mon May 19 22:57:59 2014 -0400 darwin/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

1. Go to $GOROOT/src/cmd/6l/6.out.h
2. According to http://golang.org/doc/asm#architectures , this file contains a list of
all the assembler instructions Go recognizes.
3. No fused-multiply-add instructions are in the list.

Please provide any additional information below.

Using FMA allows for several faster and more accurate versions of several algorithms. My
current main use case is polynomial evaluation (Horner's scheme and Estrin's scheme are
both built on FMA), but another very important one is matrix multiplication. Adding at
least a subset of the FMA instructions available on newer AMD and Intel CPUs (available
on PowerPC since at least the 600 series) would improve the ability of performance
minded coders and compiler optimization writers to make faster running code. 

For instance, a more accurate version of the update in the Mandelbrot set is:
if the starting point is Cr + i*Ci (and halfCi = 0.5 * Ci)
Zr, Zi = fma( Zr - Zi, Zr + Zi, Cr ), 2 * fma( Zr, Zi, halfCi )

The update to Zr is likely slower than what is presently done for the The Computer
Language Benchmarks Game, but an fma based update to Zi would definitely speed things up.

Granted, the speedup would only materialize when issue #4978 (
https://golang.org/issue/4978 ) is resolved, but having code in
useful libraries in place to take advantage of that optimization would be nice instead
of having to recode everything afterwords.

To that end, I would request exposing at least the following (would do it myself if I
could find an understandable listing of the opcodes and documentation on how to add them
to the compiler): 

For greatest compatibility with AMD chips, 4 operand forms ( from
http://en.wikipedia.org/wiki/X86_instruction_listings#FMA_instructions ): 
VFMADDPD, VFMADDPS, VFMADDSD, VFMADDSS.

And for Intel chips, the 3 operand forms ( from Vol. 1 14-21 of Intel's developer's
manual:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals
):
VFMADD231PD, VFMADD231PS, VFMADD231SD, VFMADD231SS

All of the other operations are permutations of minus signs applied to these (and which
register gets clobbered for the 3 op version - the one I'm requesting is the one that
works best with polynomial evaluation, naturally). 

Solving this issue would also make issue #681 more straightforward (
https://golang.org/issue/681 ), although even having the opcodes
would be enough for that one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions