Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused multiple-add instruction #23342

Open
dlesnoff opened this issue Feb 22, 2024 · 6 comments
Open

Fused multiple-add instruction #23342

dlesnoff opened this issue Feb 22, 2024 · 6 comments

Comments

@dlesnoff
Copy link

Summary

I would like to add the fused-multiply-add instruction either directly to system or to std/math.
This instruction is crucial for high-performance computing. It computes a product and an addition in one cycle: fma(a, x, y) = a*x + y. Not only is this instruction faster but it also comes with less rounding errors. It is guaranteed to compute the product as if it had infinite precision.
The problem with this instruction, is that some operations should not use a fused-multiple add. Indeed, when one wants to compute the product of two complex numbers a = x + iy and b = x' + iy', the imaginary part is given by: yx' + xy'. This sum of two products can be computed by using fma in two ways, but none of them would be accurate enough, to distinguish whether the complex number product is real or complex. (See e.g. Nicolas J. Higham, Accuracy and Stability of Numerical Algorithms, 2002).
Consequently, it is up to the programmer and not the compiler to decide, given the context, whether or not a fused multiply-add should be used.
This difference of rounding with the two separate instructions enable the use of error-free transformations like the two-product error free transformation of Ogita et al (2005)

Description

Currently, I import in my projects:

proc fma(x,y,z: float32): float32 {.importc: "fmaf", header: "<math.h>".}
proc fma(x,y,z: float64): float64 {.importc: "fma", header: "<math.h>".}

A solution would be to add these imports in std/math. We would know be able to use error-free transformations without these imports:

import std/math
# Two-productFMA Ogita et al. 2005
# Expresses x*y accurately as h + l
h = x*y # Floating-point approximation of x*y
l = fma(x, y, -h) # Floating-point rounding in l

Alternatives

No response

Examples

No response

Backwards Compatibility

There might be some issues with the JS backend, as always when it comes with floating-point arithmetic support.
I propose to align with the C++ specification.

Links

https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
https://en.cppreference.com/w/cpp/numeric/math/fma
https://epubs.siam.org/doi/epdf/10.1137/030601818

@juancarlospaco
Copy link
Collaborator

For JS, would something like this be enough?

function fma(a, b, c) {
  return (a * b) + c
}

And an explanation on the documentation for JS targets.

@dlesnoff
Copy link
Author

dlesnoff commented Feb 22, 2024

A quick google search gives:
https://gist.github.com/Yaffle/fb47de4c18b63147699e0b621f1031f7
and
https://gist.github.com/munrocket/adfbca362bf710f53621e483cb208f0e
They are more accurate albeit probably much slower than a pure axpy (a*b+c) computation.
One test that illustrates the difference:

assert( 0.1 * 10 - 1 == 0)
assert( fma(0.1, 10, 1) == 5.55112e-17) # Example from en.cppreference.com

I would expect a different rounding behaviour from a fma function than from an axpy computation.

P.S: The axpy term comes from the BLAS (basic linear algebra subprogram) specification.

@juancarlospaco
Copy link
Collaborator

Then make it inside a when not defined(js): block and mention in docs is not available in JS.

@awr1
Copy link
Contributor

awr1 commented Feb 22, 2024

most recent high-perf x86 CPUs should do a uop-fuse on axpy, the FMA3 instructions really exist for increased accuracy. not really certain about ARM-land.

@ringabout
Copy link
Member

Feel free to open a pull request; see also nim-lang/RFCs#92

@dlesnoff
Copy link
Author

dlesnoff commented Mar 5, 2024

@ringabout Sorry, I do not have time to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants