Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for stable SIMD in Rust #48556

Open
alexcrichton opened this issue Feb 26, 2018 · 74 comments
Open

Tracking issue for stable SIMD in Rust #48556

alexcrichton opened this issue Feb 26, 2018 · 74 comments

Comments

@alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Feb 26, 2018

This is a tracking issue for RFC 2325, adding SIMD support to stable Rust. There's a number of components here, including:

The initial implementation of this is being added in #48513 and the next steps would be:


Known issues

@scottmcm
Copy link
Member

@scottmcm scottmcm commented Feb 26, 2018

My one request for the bikeshed (which the current PR already does and may be obvious, but I'll write it down anyway): Please ensure they're not all in the same module as things like undefined_behaviour and [un]likely, so that those rust-defined things don't get lost in the sea of vendor intrinsics.

@cuviper
Copy link
Member

@cuviper cuviper commented Feb 26, 2018

What will be the story for external LLVM? (lacking MCSubtargetInfo::getFeatureTable())

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented Feb 26, 2018

@scottmcm certainly! I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).

@cuviper currently it's an unresolved question, so if it doesn't get fixed it means that using an external LLVM would basically mean that #[cfg(target_feature = ...)] would always expand to false (or the equivalent thereof)

@hanna-kruppe
Copy link
Contributor

@hanna-kruppe hanna-kruppe commented Feb 26, 2018

I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).

One option raised in the RFC thread (that I personally quite like) was stabilizing std::intrinsics (only the module), keep the stable rust intrinsics in that module (they can already be imported from that location due to a long-standing bug in stability checking) and put these new platform-specific intrinsics in submodules. IIUC this would also satisfy @scottmcm's request.

To be explicit, under that plan the rustdoc page for std::intrinsics would look like this:


Modules

  • x86_64
  • arm
  • ...

Functions

  • copy
  • copy_nonoverlapping
  • drop_in_place
  • ...
@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented Mar 3, 2018

Another naming idea I've just had. Right now the feature detection macro is is_target_feature_enabled!, but since it's so target specific it may be more apt to call it is_x86_target_feature_enabled!. This'll make it a pain to call on x86/x86_64 though which could be a bummer.

@nox
Copy link
Contributor

@nox nox commented Mar 5, 2018

Why keep all the leading underscores for the intrinsics? Surely even if we keep the same names as what the vendors chose, we can still remove those signs, right?

@BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Mar 5, 2018

The point is to expose vendor APIs. The vendor APIs have underscores. Therefore, ours do too.

@nox
Copy link
Contributor

@nox nox commented Mar 5, 2018

It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.

@nox
Copy link
Contributor

@nox nox commented Mar 5, 2018

I would be happy dropping the topic if it was discussed at length already, but I couldn't find any discussion specific to them leading underscores.

@BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Mar 5, 2018

@nox rust-lang/stdarch#212 --- My comment above is basically a summary of that. I probably won't say much else on the topic.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 5, 2018

@nox, @BurntSushi Continuing the discussion from there... since it hasn't been mentioned before:

Leading _ for identifiers in rust often means "this is not important" - so just taking the names directly from the vendors may wrongly give this impression.

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented Mar 5, 2018

@nox @Centril the recurring theme of stabilizing SIMD in Rust is "it's not our job to make this nice". Any attempt made to make SIMD different than what the vendors define has ended with uncomfortable questions and intrinsics that end up being left out. To that end the driving force for SIMD intrinsics in Rust is to get anything compiling on stable.

Crates like faster are explicitly targeted at making SIMD usage easy, fast, and ergonomic. The standard library's intrinsics are not intended to be widely used nor use for "intro level" problems. Leveraging the SIMD intrinsics is quite unsafe (due to target feature detection/selection) and can come at a high cost if used incorrectly.

Overall, again, the goal is to not enable ergonomic SIMD in Rust right now, but any SIMD in Rust. Following exactly what the vendors say is the easiest way for us to guarantee that all Rust programs will always have access to vendor intrinsics.

@hanna-kruppe
Copy link
Contributor

@hanna-kruppe hanna-kruppe commented Mar 5, 2018

I agree that the leading underscores are a C artifact, not a vendor choice (the C standard reserves identifiers of this form, so that's what C compilers use for intrinsics). Removing them is neither "trying to make it nicer/more ergonomic" (it's really only a minor aesthetic difference) nor involves any per-intrinsic judgement calls. It's a dead simple mechanical translation for a difference in language rules, almost as much as __m128 _mm_foo(); is mechanically translated to fn _mm_foo() -> __m128;.

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented Mar 5, 2018

@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?

@Centril
Copy link
Contributor

@Centril Centril commented Mar 5, 2018

@alexcrichton

@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?

Can't speak for CPU vendors, but the probability seems very very low. Why would they add an intrinsic where the difference is only an underscore..? Further, as Rust's influence grows, they might not do this simply because of Rust.

@hanna-kruppe
Copy link
Contributor

@hanna-kruppe hanna-kruppe commented Mar 5, 2018

A name like mm_foo (no leading underscore at all) is not reserved in the C language, so it can't be used for compiler-supplied extensions without breaking legal C programs. There are a few theoretical possibilities for a vendor to nevertheless create intrinsics without leading underscores:

  • they could expose it only in C++ (with namespacing) -- or, for that matter, another language that isn't C
  • they could break legal C programs (very unlikely, and I'll eat my hat if GCC or Clang developers accept this)
  • A future version of C adds some way of doing namespacing, and people start using it for intrinsics

All extremely unlikely. The first one seems like the only one that doesn't sound like science fiction to me, and if that happens we'd have other problems anyway (such intrinsics may use function overloading and other features Rust doesn't have).

@alexreg
Copy link
Contributor

@alexreg alexreg commented Mar 5, 2018

It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.

This. The whole point is that the underscore-leading names were chosen so as to specifically not clash with user-defined functions. Which means they should never be using non-underscore names. It's against well-established C conventions. Hence, we should just rename them to follow Rust conventions, with no real chance there will be any name clash in the future, providing the vendors stay sane and respect C conventions.

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented Mar 5, 2018

@Centril "probability seems very very low" is what I would say as well, but we're talking about stability of functions in the standard library, so "low probability" won't cut it unfortunately.

@rkruppe I definitely agree, yeah, but "extremely unlikely" to me says "follow the vendor spec to the letter and we can figure out ergonomics later".

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented Mar 5, 2018

Another point worth mentioning for staying exactly to the upstream spec is that I believe it actually boosts learnability. You'll have instant familiarity with any SIMD/intrinsic code written in C, of which there's already quite a lot!

If we stray from the beaten path then we'll have to have a section of the documentation which is very clear about defining the mappings between intrinsic names and what we actually expose in Rust.

@pythoneer
Copy link
Contributor

@pythoneer pythoneer commented Mar 5, 2018

I don't think renaming (no leading underscore or any other alteration) is useful. This is simply not the goal and only introduces pain points. I cannot think of a reason other than "i like that more" to justify that. It only introduces the possibility to naming clashes and "very very unlikely" is not convincing because we can prevent this 100% by not doing it altogether.

I think its the best choice to follow the vendor naming schema as close as possible and i think we should even break compatibility if we ever introduce an error in the "public API" without doing some renaming like _mm_intr_a to _mm_intr_a2 and start diverging the exact naming schema introduced by the vendor.

@nox
Copy link
Contributor

@nox nox commented Mar 5, 2018

@alexcrichton But as @rkruppe said, removing the leading underscore isn't about ergonomics, it's about not porting C defects to Rust blindly.

@nox
Copy link
Contributor

@nox nox commented Mar 5, 2018

Sorry for the double post, but I also want to add that arguing that a vendor may release an unprefixed intrinsic with the same name as a prefixed one is to me as hypothetical as arguing that bool may not be a single byte on some platform we would like to support.

@pythoneer
Copy link
Contributor

@pythoneer pythoneer commented Mar 5, 2018

@nox but why stop by the _? We could also fully rename the function with ps and pd into f32 and f64 which would be something "more Rust". Its somewhat arbitrary to just remove the leading underscore. And we could argue back and forth what is ergonomics and what isn't but i don't think there is a very good line to distinguish that to a point every body agrees.

@nox
Copy link
Contributor

@nox nox commented Mar 5, 2018

@pythoneer Because the name is what the vendor decided, with a leading underscore because of nondescript limitations of C.

@pythoneer
Copy link
Contributor

@pythoneer pythoneer commented Mar 5, 2018

@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.

@alexreg
Copy link
Contributor

@alexreg alexreg commented Mar 5, 2018

@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.

Interface, sure, but not necessarily the naming conventions!

@bors bors closed this in #49664 Apr 17, 2018
@gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented May 8, 2018

I don't know if it's too late to still tune things here, but the original RFC had two features that were changed during the discussion over there:

  • the submitted RFC put all intrinsics in std::arch::*, the revised RFC in std::arch::{arch_name}.
  • the submitted RFC used is_feature_detected! for run-time feature detection, the revised RFC uses is_{arch_name}_feature_detected!

The RFC was accepted before those changes were made. The changes were made in the RFC at the end of February, implemented at the beginning of March, and the FCP went through mid April. Right now we have ~2 month of experience with these changes

In any case, going through the RFC, I cannot pin point any concrete argument about why:

  • the intrinsics of each architecture should be in a different std::arch::{arch_name} module,
  • the architecture name should be part of the is_..._feature_detected! macros.

In particular, std::arch only contains one single module, the one of the current architecture, and that's it. Also, there is only one is_..._feature_detected! macro re-exported, the one of the current architecture.

These last-minute changes make it more painful than necessary to write code even for x86, where one has to:

#[target_feature(enabled = "sse3")]
unsafe fn foo() {
    #[cfg(target_feature = "x86")] use core::arch::x86::*;
    #[cfg(target_feature = "x86_64")] use core::arch::x86_64::*;
    /* ... */
}

all over the place, or at the top level, to avoid having to do this all over the place. Things don't get better when targeting multiple architectures. What before was horrible:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] 
#[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] 
unsafe foo() {
    use core::arch::*;

     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
         if is_feature_detected!("avx2") { ... } else { ... }
     }
     #[cfg(any(target_arch = "arm", target_arch = "aarch64"))] {
        if is_feature_detected!("crypto") { ... } else { ... }
     }  
}

now is worse:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] 
#[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] 
unsafe foo() {
    #[cfg(target_arch = "x86")]  use core::arch::x86::*; 
    #[cfg(target_arch = "x86_64")]  use core::arch::x86_64::*
    #[cfg(target_arch = "arm")] use core::arch::arm::*;
    #[cfg(target_arch = "aarch64")] use core::arch::aarch64::*; 

     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
         if is_x86_feature_detected!("crypto") { ... } else {  ... }
     }
     #[cfg(target_arch = "arm")] {
        if is_arm_feature_detected!("crypto") { ... } else { ... }
     }
     #[cfg(target_arch = "aarch64")] {
        if is_aarch64_feature_detected!("crypto") { ... } else { ... }
     }
}

This is particularly worrying if we want to add new "feature sets" for ergonomics like simd128 and simd256 since before the changes the above would just become:

#[target_feature(enable = "simd128")] 
unsafe foo() {
    use core::arch::*;
     if is_feature_detected!("crypto") { ... } else { ... }
}

I remember that to me they sounded like a potentially good idea back then, so I did not gave them more thought (I was more in the "I want SIMD now" mood). But now that the love story has faded and I've had the chance to use them a couple of times, I've clashed against them every single time:

Anyways, can somebody summarize why doing those two changes were a good idea?

In particular for the first change of putting the intrinsics in std::arch::{arch_name}, AFAIK we are never going to add more modules to std::arch because that would mean that the current code is being compiled for two archs at the same time, and in that case, one arch shouldn't be able to access the intrinsics of the other anyways. For the run-time feature detection macros, the benefits are smaller (but still there), since each arch has different intrinsics. But one idiom I would like to use is:

#[cfg(target_arch = "arm")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

#[cfg(target_arch = "x86_64")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

fn foo() {
   if is_feature_detected("simd128") { bar() } else { fallback() }
}

and the named macros wouldn't allow that.


There are two ways of fixing this in a backwards compatible way:

  • re-exporting all of std::arch::{arch_name}::* via, e.g., std::arch::current::*
  • adding a is_feature_detected!("...") macro that dispatches to the named ones depending on the architecture.

So I don't think we should block landing this on these ergonomic issues. In any case, I don't feel I understand the real reasons behind the change, so maybe adding these conveniences defeats their purpose.


cc @alexcrichton @rkruppe @eddyb @hsivonen @BurntSushi @Ericson2314 (those who had opinions about this in the RFC)

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented May 8, 2018

@gnzlbg this was something I forgot about in the original RFC personally. In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use it. For example Windows-specific functionality is at std::os::windows. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch as a warning that what you're using is indeed not portable and specific to only one platform.

The name of the macro was the same rationale, ensuring that you aren't tricked to thinking it can be invoked in a portable context but rather explicitly specifying that it's not portable.

@parched
Copy link
Contributor

@parched parched commented May 9, 2018

In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use it. For example Windows-specific functionality is at std::os::windows. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch as a warning that what you're using is indeed not portable and specific to only one platform.

Is this something that will be covered with the new portability lint? Also, by that rationale, should everything in std::arch be in target feature submodules?

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented May 9, 2018

@parched ideally, yes! If that exists we could perhaps consider moving everything wholesale to different modules.

@gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented May 9, 2018

we could perhaps consider moving everything wholesale to different modules.

For x86/x86_64 this should be easily doable since we already do this internally in stdsimd. For other platforms we can do this in a best effort basis.

@vks
Copy link
Contributor

@vks vks commented May 23, 2018

core::simd::FromBits still points to this issue. Shouldn't it point to an open issue?

@gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented May 29, 2018

So should we do the changes? (add is_x86_64_feature_detected, expose the feature submodules instead of all intrinsics directly, ...) We don't have much time to do this if we want to, and I could do this on Friday this week.

@alexcrichton
Copy link
Member Author

@alexcrichton alexcrichton commented May 29, 2018

Er sorry I misread, I think. I do not think we should change anything. Perhaps one day intrinsic can live directly in std::arch and be easier to use with the portability lint, but don't have the portability lint.

@xacrimon
Copy link

@xacrimon xacrimon commented Aug 6, 2020

Is there any word on when we can stabilize instrinsics like https://doc.rust-lang.org/core/arch/x86_64/fn.cmpxchg16b.html ?
I am running into some issues implementing some lockfree algorithms without it.

@comex
Copy link
Contributor

@comex comex commented Aug 6, 2020

Would stabilizing AtomicU128 (theoretically tracked in #32976) satisfy your use case, or is there some reason you specifically need the x86 intrinsic?

@xacrimon
Copy link

@xacrimon xacrimon commented Aug 6, 2020

That would do it as long as it has weak compare and exchange or compare and swap. I really just need a 128 bit compare and swap to fit a pointer and refcount. How is that implemented on archs like spark and ppc that don't support it that easily. LL/SC?

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Aug 6, 2020

AtomicU128 will only be available on targets that support it. AFAIK that's only x86_64 and AArch64.

@xacrimon
Copy link

@xacrimon xacrimon commented Aug 6, 2020

Ah, it could be theoretically implemented with doublewidth LL/SC on other architectures I think. Is that a possible thing to do?

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Aug 6, 2020

Only AArch64 has 2x64-bit LL/SC.

@aloucks
Copy link
Contributor

@aloucks aloucks commented Aug 27, 2020

Are the half-precision x86/64 functions intended to remain unstable? The compiler errors and the documentation points to this issue, but it was closed quite a while ago along with the stabilization PR.

EDIT: I also noticed that the f16c feature isn't reported in CARGO_CFG_TARGET_FEATURE in the stable compiler when it's explicitly requested: RUSTFLAGS="-C target-cpu=x86-64 -C target-feature=+sse3,+sse4.1,+avx,+f16c" cargo test. However, it does show up in nightly.

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Sep 1, 2020

I think someone just needs to send a stabilization PR for that feature. But first we need to ensure that all the intrinsics covered by the f16c feature are properly implemented.

@novacrazy
Copy link

@novacrazy novacrazy commented Nov 1, 2020

Any updates on stabilizing the F16C instructions?

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Dec 11, 2020

@novacrazy I don't think there's anything blocking F16C intrinsics, feel free to send a stabilization PR for them.

@frewsxcv
Copy link
Member

@frewsxcv frewsxcv commented Dec 18, 2020

There are four occurrences of #[unstable(feature = "stdsimd", issue = "48556")] in the codebase (this issue number is 48556). This seems to conflict with the fact that this issue is closed. Should these occurrences be referencing a different issue? See also: #76412

@Amanieu
Copy link
Contributor

@Amanieu Amanieu commented Dec 24, 2020

I'm going to reopen this issue. SIMD was only stabilized on x86/x86_64, not on other architectures.

@Amanieu Amanieu reopened this Dec 24, 2020
@jhpratt
Copy link
Contributor

@jhpratt jhpratt commented Jan 15, 2021

I believe the FCP label should be removed — that was for something nearly three years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.