Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fused multiply-add vector intrinsics #32066

Merged
merged 3 commits into from
Mar 7, 2016
Merged

Add fused multiply-add vector intrinsics #32066

merged 3 commits into from
Mar 7, 2016

Conversation

ruuda
Copy link
Contributor

@ruuda ruuda commented Mar 5, 2016

This adds support for fused multiply-add and multiply-subtract vector intrinsics for 128 and 256-bit vectors of f32 and f64. These correspond to the intrinsics listed here except for the _ss and _sd variants. The intrinsics added are:

  • fmadd
  • fmaddsub
  • fmsub
  • fmsubadd
  • fnmadd
  • fnmsub

The “fma” target feature must be enabled by passing -C target-feature=+fma to rustc when using these, otherwise LLVM will complain.

I verified locally that the x86_mm256_fmadd_ps and x86_mm256_fmsub_ps work.

This defines the following intrinsics for 128 and 256 bit vectors of f32
and f64:

 * `fmadd`
 * `fmaddsub`
 * `fmsub`
 * `fmsubadd`
 * `fnmadd`
 * `fnmsub`

The `_sd` and `_ss` variants are not included yet.

Intel intrinsic reference: https://software.intel.com/en-us/node/523929

The intrinsics there are listed under AVX2, but in the Intel Intrinsic
Guide they are part of the "FMA" technology, and LLVM puts them under
FMA, not AVX2.
The file it generates had been modified, but instead the generator
should have been modified, and the file regenerated. This merges the
modifications into the template in the generator.
The exact command used was:

    $ cd src/etc/platform-intrinsics/x86
    $ python2 ../generator.py --format compiler-defs -i info.json   \
      sse.json sse2.json sse3.json ssse3.json sse41.json sse42.json \
      avx.json avx2.json fma.json                                   \
      > ../../../librustc_platform_intrinsics/x86.rs
@ruuda
Copy link
Contributor Author

ruuda commented Mar 5, 2016

cc @huonw, you wrote most of the other intrinsics and SIMD support, can you please have a look if you have the time?

@alexcrichton
Copy link
Member

@bors: r+ a409076

Thanks!

@alexcrichton alexcrichton self-assigned this Mar 6, 2016
bors added a commit that referenced this pull request Mar 7, 2016
This adds support for fused multiply-add and multiply-subtract vector intrinsics for 128 and 256-bit vectors of `f32` and `f64`. These correspond to the intrinsics [listed here](https://software.intel.com/en-us/node/523929) except for the `_ss` and `_sd` variants. The intrinsics added are:

 * `fmadd`
 * `fmaddsub`
 * `fmsub`
 * `fmsubadd`
 * `fnmadd`
 * `fnmsub`

The “fma” target feature must be enabled by passing `-C target-feature=+fma` to rustc when using these, otherwise LLVM will complain.

I verified locally that the `x86_mm256_fmadd_ps` and `x86_mm256_fmsub_ps` work.
@bors
Copy link
Contributor

bors commented Mar 7, 2016

⌛ Testing commit a409076 with merge 6d262db...

@bors bors merged commit a409076 into rust-lang:master Mar 7, 2016
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 15, 2016

Is there a reason why fma 512 wasn't included? IIRC llvm also has support for these.

@ruuda
Copy link
Contributor Author

ruuda commented Mar 15, 2016

512-bit FMA instructions are part of AVX-512, not of the FMA instructions. I don’t have a CPU that supports AVX-512.

@ruuda ruuda deleted the fma branch November 30, 2016 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants