Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vmla_* / vmlal_* #955

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add vmla_* / vmlal_* #955

wants to merge 1 commit into from

Conversation

makotokato
Copy link
Contributor

Add Vector multiply accumulate intrinsic.

@rust-highfive
Copy link

r? @Amanieu

(rust_highfive has picked a reviewer for you, use r? to override)

@Amanieu
Copy link
Member

Amanieu commented Nov 20, 2020

VMLA is not a fused multiply-add. It involves an intermediate rounding step just like if you use two separate operations. Only VFMA is a fused multiply-add.

@Amanieu
Copy link
Member

Amanieu commented Dec 1, 2020

Ping. The vmla_* functions needs to use simd_add and simd_mul. The vfma_* functions need to use simd_fma.

#[target_feature(enable = "neon")]
#[cfg_attr(test, assert_instr(fmla))]
pub unsafe fn vmla_f64(a: float64x1_t, b: float64x1_t, c: float64x1_t) -> float64x1_t {
simd_fma(b, c, a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the mul_add method on f64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay an thank you for review. I will update this PR.

@makotokato
Copy link
Contributor Author

unrelated CI is failure due to OOM...

@Amanieu
Copy link
Member

Amanieu commented Jan 30, 2021

CI should be fixed now.

However you still haven't fixed the definitions of the functions:

  • vfma_f{32,64} should always do a fused-multiply add. (vfma on ARM, fmadd/fmla on AArch64`)
  • vmla_f{32,64} should always do a separate multiple and add. (vmla on ARM, fmul + fadd on AArch64).

The vfma_f{32,64} functions are also available on 32-bit ARM if the vfpv4 feature is enabled.

Have a look at this: https://godbolt.org/z/34Wo5n

@bors
Copy link
Contributor

bors commented Mar 9, 2021

The latest upstream changes (presumably 594ff85) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants