Added `MulAdd` and `MulAddAssign` traits #59

regexident · 2018-04-09T08:46:16Z

Both f32 and f64 implement fused multiply-add, which computes (self * a) + b with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add:

fn mul_add(self, a: f32, b: f32) -> f32[src]

It is however not possible to make use of this in a generic context by abstracting over a trait.

My concrete use-case is machine learning, gradient descent to be specific,
where the core operation of updating the gradient could make use of mul_add for both its weights: Vector as well as its bias: f32:

struct Perceptron {
  weights: Vector,
  bias: f32,
}

impl MulAdd<f32, Self> for Vector {
  // ...
}

impl Perceptron {
  fn learn(&mut self, example: Vector, expected: f32, learning_rate: f32) {
    let alpha = self.error(example, expected, learning_rate);
    self.weights = example.mul_add(alpha, self.weights);
    self.bias = self.bias.mul_add(alpha, self.bias)
  }
}

(The actual impl of Vector would be generic over its value type: Vector<T>, thus requiring the trait.)

cuviper · 2018-04-09T19:14:49Z

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

cuviper · 2018-04-09T19:18:00Z

src/ops/mul_add.rs

+    #[inline]
+    fn mul_add(self, a: Self, b: Self) -> Self::Output {
+        #![allow(unconditional_recursion)]
+        f64::mul_add(self, a, b)


I believe this allow is masking a real problem. In a #![no_std] build, f64 doesn't have most of its inherent methods, which means this call will resolve back to MulAdd::mul_add again -- thus the unconditional recursion.

Oh #![no_std] is what made the builds fail on CI, yet succeed on my local machine.
Adding the #![allow(unconditional_recursion)] felt fishy.

How would you propose solving this? Something like this?

if cfg!(feature = "std") { f64::mul_add(self, a, b) } else { (self * a) + b }

I think it would be better to not implement the trait at all for floats in #![no_std] mode, so we don't mislead about the rounding errors. Either that, or do a manual implementation with appropriate rounding fixups, but I imagine that's non-trivial.

regexident · 2018-04-10T07:51:49Z

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd as a convenient drop-in replacement for T: Float + MulAdd.

I'm currently on and off working on making japaric/fpa usable as a proper replacement for Float on embedded/no_std without FPU.

I'd like to be able to write algebraic code that makes full use of all available optimized code paths if applicable, yet remains code-compatible with environments if reduced sophistication.

vks · 2018-04-10T13:35:54Z

his produces a more accurate result with better performance than a separate multiplication operation followed by an add

Unfortunately, in my experience this is not necessarily true. For f64 I don't see a performance difference on my machine, and for f32 it is slower without target-cpu=native and faster with.

regexident · 2018-04-10T17:08:09Z

Unfortunately, in my experience this is not necessarily true.

This does not invalidate the desire to have a way to generically express fn mul_add through a trait though, or does it?

vks · 2018-04-10T17:42:40Z

No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c is less precise than a.mul_add(b, c) and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.

Also see rust-lang/rust#44805:

It's not just about exact results, it's also about reasoning about how inexact the result can get, and having particular behavior if an argument or the intermediate product is non-finite. For an example of the latter, consider fma(MAX_FLT, MAX_FLT, NEG_INFINITY) (evaluates to -inf) vs (MAX_FLT * MAX_FLT) + NEG_INFINITY (evaluates to NaN).

regexident · 2018-04-17T14:42:11Z

No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c is less precise than a.mul_add(b, c) and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.

Good point, I'll gladly add a mention of this to the documentation. :)

Any further feedback? What needs to be done to proceed? Do we want to proceed?

cuviper · 2018-04-17T19:04:38Z

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd as a convenient drop-in replacement for T: Float + MulAdd.

Note there's Real::mul_add too.

Supporting the no_std fpa crate is a more compelling example though, since Float and Real are only available in std builds.

I still think we should not have a (self * a) + b fallback for no_std floats -- I'd rather not implement the trait for no_std floats at all if we can't meet the same rounding accuracy. The performance is secondary, maybe not even worth bringing up.

…guard

regexident · 2018-04-18T08:32:44Z

I just moved the impls of MulAdd/MulAddAssign for f32/f64 behind the #[cfg(feature = "std")] feature guard with commit 28be885.

cuviper · 2018-04-18T18:39:18Z

src/ops/mul_add.rs

+                        let x: $t = 3.4;
+                        let b: $t = 5.6;
+
+                        let abs_difference = (m.mul_add(x, b) - (m*x + b)).abs();


I'm surprised this test didn't fail on no_std, but I think it's because libtest links to libstd, so the compiler still has the inherent impls of f32/f64::mul_add available. Please change these tests to explicitly call MulAdd::mul_add(m, x, b), and then this test will need a cfg gate.

cuviper · 2018-05-04T19:12:27Z

bors r+

59: Added `MulAdd` and `MulAddAssign` traits r=cuviper a=regexident Both `f32` and `f64` implement fused multiply-add, which computes `(self * a) + b` with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add: ```rust fn mul_add(self, a: f32, b: f32) -> f32[src] ``` It is however not possible to make use of this in a generic context by abstracting over a trait. My concrete use-case is machine learning, [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) to be specific, where the core operation of updating the gradient could make use of `mul_add` for both its `weights: Vector` as well as its `bias: f32`: ```rust struct Perceptron { weights: Vector, bias: f32, } impl MulAdd<f32, Self> for Vector { // ... } impl Perceptron { fn learn(&mut self, example: Vector, expected: f32, learning_rate: f32) { let alpha = self.error(example, expected, learning_rate); self.weights = example.mul_add(alpha, self.weights); self.bias = self.bias.mul_add(alpha, self.bias) } } ``` (The actual impl of `Vector` would be generic over its value type: `Vector<T>`, thus requiring the trait.) Co-authored-by: Vincent Esche <regexident@gmail.com> Co-authored-by: Josh Stone <cuviper@gmail.com>

bors · 2018-05-04T19:15:58Z

Build succeeded

continuous-integration/travis-ci/push

regexident force-pushed the mul_add branch 6 times, most recently from 59ad2ee to 110914a Compare April 9, 2018 10:55

cuviper reviewed Apr 9, 2018

View reviewed changes

Added MulAdd and MulAddAssign traits

8303630

regexident force-pushed the mul_add branch from 110914a to 8303630 Compare April 10, 2018 08:09

Moved impl of MulAdd/MulAddAssign for f32/f64 behind feature …

28be885

…guard

cuviper reviewed Apr 18, 2018

View reviewed changes

Test MulAdd explicitly, guarded by std for floats

0d35803

bors bot merged commit 0d35803 into rust-num:master May 4, 2018

regexident deleted the mul_add branch May 5, 2018 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `MulAdd` and `MulAddAssign` traits #59

Added `MulAdd` and `MulAddAssign` traits #59

regexident commented Apr 9, 2018 •

edited

Loading

cuviper commented Apr 9, 2018

cuviper Apr 9, 2018

regexident Apr 10, 2018 •

edited

Loading

cuviper Apr 10, 2018

regexident commented Apr 10, 2018 •

edited

Loading

vks commented Apr 10, 2018

regexident commented Apr 10, 2018

vks commented Apr 10, 2018 •

edited

Loading

regexident commented Apr 17, 2018

cuviper commented Apr 17, 2018

regexident commented Apr 18, 2018 •

edited

Loading

cuviper Apr 18, 2018

cuviper commented May 4, 2018

bors bot commented May 4, 2018

Added MulAdd and MulAddAssign traits #59

Added MulAdd and MulAddAssign traits #59

Conversation

regexident commented Apr 9, 2018 • edited Loading

cuviper commented Apr 9, 2018

cuviper Apr 9, 2018

Choose a reason for hiding this comment

regexident Apr 10, 2018 • edited Loading

Choose a reason for hiding this comment

cuviper Apr 10, 2018

Choose a reason for hiding this comment

regexident commented Apr 10, 2018 • edited Loading

vks commented Apr 10, 2018

regexident commented Apr 10, 2018

vks commented Apr 10, 2018 • edited Loading

regexident commented Apr 17, 2018

cuviper commented Apr 17, 2018

regexident commented Apr 18, 2018 • edited Loading

cuviper Apr 18, 2018

Choose a reason for hiding this comment

cuviper commented May 4, 2018

bors bot commented May 4, 2018

Build succeeded

Added `MulAdd` and `MulAddAssign` traits #59

Added `MulAdd` and `MulAddAssign` traits #59

regexident commented Apr 9, 2018 •

edited

Loading

regexident Apr 10, 2018 •

edited

Loading

regexident commented Apr 10, 2018 •

edited

Loading

vks commented Apr 10, 2018 •

edited

Loading

regexident commented Apr 18, 2018 •

edited

Loading