Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added MulAdd and MulAddAssign traits #59

Merged
merged 3 commits into from
May 4, 2018
Merged

Conversation

regexident
Copy link
Contributor

@regexident regexident commented Apr 9, 2018

Both f32 and f64 implement fused multiply-add, which computes (self * a) + b with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add:

fn mul_add(self, a: f32, b: f32) -> f32[src]

It is however not possible to make use of this in a generic context by abstracting over a trait.

My concrete use-case is machine learning, gradient descent to be specific,
where the core operation of updating the gradient could make use of mul_add for both its weights: Vector as well as its bias: f32:

struct Perceptron {
  weights: Vector,
  bias: f32,
}

impl MulAdd<f32, Self> for Vector {
  // ...
}

impl Perceptron {
  fn learn(&mut self, example: Vector, expected: f32, learning_rate: f32) {
    let alpha = self.error(example, expected, learning_rate);
    self.weights = example.mul_add(alpha, self.weights);
    self.bias = self.bias.mul_add(alpha, self.bias)
  }
}

(The actual impl of Vector would be generic over its value type: Vector<T>, thus requiring the trait.)

@regexident regexident force-pushed the mul_add branch 6 times, most recently from 59ad2ee to 110914a Compare April 9, 2018 10:55
@cuviper
Copy link
Member

cuviper commented Apr 9, 2018

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

#[inline]
fn mul_add(self, a: Self, b: Self) -> Self::Output {
#![allow(unconditional_recursion)]
f64::mul_add(self, a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this allow is masking a real problem. In a #![no_std] build, f64 doesn't have most of its inherent methods, which means this call will resolve back to MulAdd::mul_add again -- thus the unconditional recursion.

Copy link
Contributor Author

@regexident regexident Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh #![no_std] is what made the builds fail on CI, yet succeed on my local machine.
Adding the #![allow(unconditional_recursion)] felt fishy.

How would you propose solving this? Something like this?

if cfg!(feature = "std") {
    f64::mul_add(self, a, b)
} else {
    (self * a) + b
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to not implement the trait at all for floats in #![no_std] mode, so we don't mislead about the rounding errors. Either that, or do a manual implementation with appropriate rounding fixups, but I imagine that's non-trivial.

@regexident
Copy link
Contributor Author

regexident commented Apr 10, 2018

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd as a convenient drop-in replacement for T: Float + MulAdd.

I'm currently on and off working on making japaric/fpa usable as a proper replacement for Float on embedded/no_std without FPU.

I'd like to be able to write algebraic code that makes full use of all available optimized code paths if applicable, yet remains code-compatible with environments if reduced sophistication.

@vks
Copy link
Contributor

vks commented Apr 10, 2018

his produces a more accurate result with better performance than a separate multiplication operation followed by an add

Unfortunately, in my experience this is not necessarily true. For f64 I don't see a performance difference on my machine, and for f32 it is slower without target-cpu=native and faster with.

@regexident
Copy link
Contributor Author

Unfortunately, in my experience this is not necessarily true.

This does not invalidate the desire to have a way to generically express fn mul_add through a trait though, or does it?

@vks
Copy link
Contributor

vks commented Apr 10, 2018

No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c is less precise than a.mul_add(b, c) and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.

Also see rust-lang/rust#44805:

It's not just about exact results, it's also about reasoning about how inexact the result can get, and having particular behavior if an argument or the intermediate product is non-finite. For an example of the latter, consider fma(MAX_FLT, MAX_FLT, NEG_INFINITY) (evaluates to -inf) vs (MAX_FLT * MAX_FLT) + NEG_INFINITY (evaluates to NaN).

@regexident
Copy link
Contributor Author

No, performance is not really related to the changes in this PR. What might be problematic is that a * b + c is less precise than a.mul_add(b, c) and will yield different results. I'm not sure how big of a problem that is in practice, but it should probably be documented.

Good point, I'll gladly add a mention of this to the documentation. :)

Any further feedback? What needs to be done to proceed? Do we want to proceed?

@cuviper
Copy link
Member

cuviper commented Apr 17, 2018

Is this actually useful to you for integers? Otherwise, you can just use Float::mul_add.

It's useful for writing code that makes use of the optimization if applicable, yet remains fully generic. A particular use case is fixed-point arithmetic on T: Real + MulAdd as a convenient drop-in replacement for T: Float + MulAdd.

Note there's Real::mul_add too.

Supporting the no_std fpa crate is a more compelling example though, since Float and Real are only available in std builds.

I still think we should not have a (self * a) + b fallback for no_std floats -- I'd rather not implement the trait for no_std floats at all if we can't meet the same rounding accuracy. The performance is secondary, maybe not even worth bringing up.

@regexident
Copy link
Contributor Author

regexident commented Apr 18, 2018

I just moved the impls of MulAdd/MulAddAssign for f32/f64 behind the #[cfg(feature = "std")] feature guard with commit 28be885.

let x: $t = 3.4;
let b: $t = 5.6;

let abs_difference = (m.mul_add(x, b) - (m*x + b)).abs();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this test didn't fail on no_std, but I think it's because libtest links to libstd, so the compiler still has the inherent impls of f32/f64::mul_add available. Please change these tests to explicitly call MulAdd::mul_add(m, x, b), and then this test will need a cfg gate.

@cuviper
Copy link
Member

cuviper commented May 4, 2018

bors r+

bors bot added a commit that referenced this pull request May 4, 2018
59: Added `MulAdd` and `MulAddAssign` traits r=cuviper a=regexident

Both `f32` and `f64` implement fused multiply-add, which computes `(self * a) + b` with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add:

```rust
fn mul_add(self, a: f32, b: f32) -> f32[src]
```

It is however not possible to make use of this in a generic context by abstracting over a trait.

My concrete use-case is machine learning, [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent) to be specific,  
where the core operation of updating the gradient could make use of `mul_add` for both its `weights: Vector` as well as its `bias: f32`:

```rust
struct Perceptron {
  weights: Vector,
  bias: f32,
}

impl MulAdd<f32, Self> for Vector {
  // ...
}

impl Perceptron {
  fn learn(&mut self, example: Vector, expected: f32, learning_rate: f32) {
    let alpha = self.error(example, expected, learning_rate);
    self.weights = example.mul_add(alpha, self.weights);
    self.bias = self.bias.mul_add(alpha, self.bias)
  }
}
```

(The actual impl of `Vector` would be generic over its value type: `Vector<T>`, thus requiring the trait.)

Co-authored-by: Vincent Esche <regexident@gmail.com>
Co-authored-by: Josh Stone <cuviper@gmail.com>
@bors
Copy link
Contributor

bors bot commented May 4, 2018

Build succeeded

@bors bors bot merged commit 0d35803 into rust-num:master May 4, 2018
@regexident regexident deleted the mul_add branch May 5, 2018 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants