Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable SIMD in Rust #2325

Merged
merged 10 commits into from Feb 26, 2018

Conversation

@alexcrichton
Copy link
Member

commented Feb 7, 2018

The purpose of this RFC is to provide a framework for SIMD to be used on stable
Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the
scaffolding for other platforms as well as a future portable SIMD design (to be
fleshed out in another RFC).

Rendered

Stable SIMD in Rust
The purpose of this RFC is to provide a framework for SIMD to be used on stable
Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the
scaffolding for other platforms as well as a future portable SIMD design (to be
fleshed out in another RFC).

@alexcrichton alexcrichton self-assigned this Feb 7, 2018

@alexcrichton alexcrichton added the T-libs label Feb 7, 2018

@aturon

This comment has been minimized.

Copy link
Member

commented Feb 7, 2018

Awesome work, @alexcrichton! And thanks go to @BurntSushi and @gnzlbg for a ton of work as well.

One minor question: it's implied, but not really stated (that I saw) that the vendor module is meant to encompass all architecture intrinsics, not just SIMD-related ones. Is that right? I wonder if arch might be more clear.

@steveklabnik

This comment has been minimized.

Copy link
Member

commented Feb 7, 2018

Seconded, this is extremely exciting!

I wonder if arch might be more clear.

Please excuse the possibly extremely under-informed question here: are these intrinsics inherent to the platform, or to the company? That is, are there Intel intrinsics that AMD doesn't offer, and vice-versa? If so, then vendor seems appropriate. If not, then arch does.

My current understanding is that this is vendor-specific, but I could be wrong. The above is how I'd think about it, though.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2018

That is, are there Intel intrinsics that AMD doesn't offer, and vice-versa?

Yes, there are. Some examples would be SSE4a, ABM, and TBM. These intrinsics are exposed by stdsimd, are available on some AMD CPUs, but not on any Intel ones.

```

When [inspecting the assembly][asm1] you notice that rustc is making use of the
`%xmmN` registers which you've read is related to SSE on your CPU. You know,

This comment has been minimized.

Copy link
@sfackler

sfackler Feb 7, 2018

Member

Nit: xmm or ymm?

This comment has been minimized.

Copy link
@BurntSushi

BurntSushi Feb 7, 2018

Member

xmm. ymm are 256 bit registers, and in this particular example, there are no target features enabled, so on x86_64 it will be limited to SSE2 instructions, which use 128 bit registers (xmm).

The example below enables the avx2 feature, which permits AVX2 instructions, which use 256 bit registers (ymm).

present on one platform may not be present on another.

The contents of the `vendor` modules are defined by, well, vendors! For example
Intel has an [intrinsics guide][intr-guide] which will serve as a guideline for

This comment has been minimized.

Copy link
@steveklabnik

steveklabnik Feb 7, 2018

Member

link target is missing here


For example most Intel intrinsics start with `_mm_` or `_mm256_` for 128 and
256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel
says. Note that all intrinsics will also be `unsafe`, according to [RFC 2045].

This comment has been minimized.

Copy link
@steveklabnik

steveklabnik Feb 7, 2018

Member

link target is missing here


There are a number of intrinsics on x86 (and other) platforms that require their
arguments to be constants rather than decided at runtime. For example
[`_mm_insert_pi16`] requires its third argument to be a constant value where

This comment has been minimized.

Copy link
@steveklabnik

steveklabnik Feb 7, 2018

Member

this isn't linked, i'm not sure if it was intended to


Over the years quite a few iterations have happened for SIMD in Rust. This RFC
draws from as many of those as it can and attempts to strike a balance between
exposing functionality whiel still allowing us to implement everything in a

This comment has been minimized.

Copy link
@steveklabnik

steveklabnik Feb 7, 2018

Member

typo: while

SIMD arguments are passed across boundaries and whatnot.

Again though, note that this section is largely an implementation detail of SIMD
in Rust today, though it's enabling the usage Effortsfectively without a lot of

This comment has been minimized.

Copy link
@sfackler
@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2018

@aturon and @steveklabnik excellent questions! I think @steveklabnik's point though hit the nail on the head in that I think we should rename to arch. To clarify though, @aturon you're correct in that this module is intended encompass all vendor intrinsics, not just those related to SIMD. (SIMD is just the banner feature to catch everyone's attention!)

@steveklabnik you're also right though in that, for example, our x86 and x86_64 target_arch targets actually have two vendors, Intel and AMD. As @gnzlbg pointed out there's AMD-specific intrinsics and I'm sure there's Intel-specific intrinsics as well. The intention is that this module we're adding to libstd would contain everything (both Intel and AMD) and you'd do runtime/compile time dispatch to figure out which ones you can call.

I think that pushes me towards arch as the entire module is conditional over purely target_arch I believe, and not actually the vendors (with multiple vendors in there). @gnzlbg what do you think?

@steveklabnik

This comment has been minimized.

Copy link
Member

commented Feb 7, 2018

The intention is that this module we're adding to libstd would contain everything (both Intel and AMD) and you'd do runtime/compile time dispatch to figure out which ones you can call.

arch sounds great to me, then.

[rfc2212]: https://github.com/rust-lang/rfcs/pull/2212

```rust
pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {

This comment has been minimized.

Copy link
@fbstj

fbstj Feb 7, 2018

Contributor

is this function supposed to be still unsafe?

@gbutler69

This comment has been minimized.

Copy link

commented Feb 7, 2018

Why would this function be "unsafe" (as marked)?

pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
    // Note that this `unsafe` block is safe because we're testing
    // that the `avx2` feature is indeed available on our CPU.
    if cfg_feature_enabled!("avx2") {
        unsafe { foo_avx2(a, b, c) }
    } else {
        foo_fallback(a, b, c)
    }
}

Is it not the case that this function is "safe" because you've wrapped the call to an unsafe function in and "unsafe" block (with a check to ensure the contract is met) and that the "foo_fallback" would be a "safe" function (not relying on SIMD/intrinsics, but, a plain-old CPU implementation of the necessary functionality)?

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2018

@gbutler69 gah oops! Looks like you and @fbstj found the same error at the same time, that was just a mistake on my part! In that section the foo function is indeed safe, it's just the AVX2-enabled one that's unsafe.

@gbutler69

This comment has been minimized.

Copy link

commented Feb 7, 2018

Also, does:

unsafe fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
    for ((a, b), c) in a.iter().zip(b).zip(c) {
        *c = *a + *b;
    }
}

need to be marked as "unsafe"? I would think not.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2018

@gbutler69 correct! That was also erroneously tagged as unsafe, but should be fixed now!

via:

```rust
#[cfg(target_feature = "avx)]

This comment has been minimized.

Copy link
@fbstj

fbstj Feb 7, 2018

Contributor

missing "

// implementation that can use `avx`
}
#[cfg(not(target_feature = "avx))]

This comment has been minimized.

Copy link
@fbstj

fbstj Feb 7, 2018

Contributor

missing "

This comment has been minimized.

Copy link
@gbutler69

gbutler69 Feb 7, 2018

@alexcrichton - Good to know that I'm actually starting to understand Rust code from reading these RFC's. I really appreciate all the work that those like you put into these RFC's for those of us still learning Rust.

@bill-myers

This comment has been minimized.

Copy link

commented Feb 7, 2018

I think there may be a learnability problem by putting these into libstd.

The issue is that users are not supposed to directly use those intrinsics since they are untyped (using __m128i instead of the proper one among u8x16, u16x8, u32x4, u64x2 and u128) and unnecessarily platform-dependent (you should not have to call an Intel-specific function just to get a vectorized reciprocal square root, for instance), which means they should instead use some crate from crates.io that provides a proper API.

However, by putting them into std::arch, people may be led to believe that they are supposed to use them directly.

So, I'd suggest adding a new library in the rustc repository called "intrinsics" instead of putting this into std::arch, documenting that the user is not supposed to use it directly, and ideally suggesting what crates to use instead.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2018

@bill-myers good points! It's definitely true that the main thrust of this RFC is not to empower all users of Rust to use SIMD, the apis are abysmal from that standpoint! Rather the motivation here is to empower anyone to use explicit SIMD on stable Rust. I'd expect that once we cross that threshold crates like faster are going to fill in the gap and make the experience of using SIMD much nicer. For now, though, we need to enable crates like that to build on stable!

With that mindset I think we'll definitely want to mention in the documentation quite thoroughly that these are very raw functions to use and you need to be quite careful and mindful when using them. The decision for location in the standard library, though, is done out of necessity rather than desire. All of these intrinsics are heavily coupled to a compiler backend (aka LLVM) and so there's really no way that they could be exported in a crate on crates.io where the guarantee is that such a crate would compile and work across many rustc versions. Inclusion in the standard library means that we have the freedom to continue to upgrade LLVM (and maybe even other backends one day) while providing a stable API surface area for all these intrinsics.

In other words this all leads me to the conclusions of:

  • We must enable these intrinsics on stable (but it's not required at this time to make them "nice")
  • We must place the intrinsics in the standard library (as that's "allowed to use unstable features on stable rust")

The final choice of module in the standard library I think is certainly fine to bikeshed. You're right in that something like arch is very "sweet" and sounds like you should be using it. Plus we may even want other things to go in "arch" one day! (maybe).

Do you have thoughts though on what a better name for such a module would be?

@eddyb

This comment has been minimized.

Copy link
Member

commented Feb 7, 2018

How about std::arch::intrinsics?

EDIT: bonus: std::arch could have various constants (or even types) about the hardware platform.

Despite the CI infrastructure of the `stdsimd` crate it seems inevitable that
we'll get an intrinsic wrong at some point. What do we do in a situation like
that? This situation is somewhat analagous to the `libc` crate but there you can
fix the problem downstream (just have a corrected type/definition) for for

This comment has been minimized.

Copy link
@fbstj

fbstj Feb 7, 2018

Contributor

double "for"

@Lokathor

This comment has been minimized.

Copy link

commented Feb 7, 2018

It would, however, be an error to write this on x86 cpus: cfg_feature_enabled!("neon"); //~ ERROR: neon is an ARM feature, not x86

Compile time error, correct?

@eddyb

This comment has been minimized.

Copy link
Member

commented Feb 13, 2018

Oh awkward, the module itself is unstable, but you can still directly import stable things inside of it :(

Known (sad) bug, I think it's been bugging @petrochenkov for years now.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 13, 2018

Thinking now the current "portability speed bump" in the standard library is something like std::os::unix where it has "unix" in the name to let you know that you're using something that's specific to a particular platform. While this may not always be around it may be good to continue that in the near term (and then move later if we can). In that sense including x86 or x86_64 in the name I think may be a good idea.

That may lead to either std::arch::x86 or std::intrinsics::x86, and I'd personally be fine with either.

@sfackler

This comment has been minimized.

Copy link
Member

commented Feb 13, 2018

I think it'd be fine to deprecate but leave those couple of stable functions in place if we want to coopt the intrinsics module, though.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Feb 13, 2018

Known (sad) bug, I think it's been bugging @petrochenkov for years now.

While the bug is sad, in this case it doesn't really bother me. std::intrinsics is a natural module name and a natural location for these intrinsics. Putting them in std::intrinsics::rustc would be no better, arguably worse. IMO we should have stabilized the module ages ago.

@Lokathor

This comment has been minimized.

Copy link

commented Feb 13, 2018

People keep linking to my post but they maybe don't seem to be using the same take-away that I intended, so I'll try to make that more clear:

  • Going by arch isn't detailed enough, but calling them all intrinsics is even less detailed than that.
  • We can name the module any dumb name at the top level, but we should then also have sub-modules for each major feature set. This is basically how the Intel Intrinsics Guide actually presents the docs to people, for example.
  • Bonus For The Future: if we ever present llvm intrinsics to the rust level we can put them under std::intrinsics::llvm51 and so forth according to the llvm version, and the [un]stable guarantee can be "this module is linked to the llvm version 5.1 intrinsics", the same as simd features are linked to do exactly whatever Intel (or whatever manufacturer) says.
@eddyb

This comment has been minimized.

Copy link
Member

commented Feb 13, 2018

if we ever present llvm intrinsics to the rust level

Initial reaction is "please no", but at least having the version in there seems very sensible.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2018

I don't personally think we should have something like std::...::sse2 for all the various target_feature directives. I believe this isn't how C is organized (everything's in one namespace, albeit different header files) and it places too much onus on us to figure out how to organize everything. Dumping everything into one namespace is what vendors (aka Intel) currently expect, so I think it's what we should do as well.

I also agree with @eddyb, let's not go crazy and expose LLVM intrinsics, this RFC is purely for SIMD/vendor intrinsics.

@Lokathor

This comment has been minimized.

Copy link

commented Feb 14, 2018

The LLVM comment at the end was pure speculation, I'm not trying to double the scope of the RFC. However, when it comes to the stability conversation I think it's best to keep in mind not just the next 6 months, but the next 60 months, and hopefully the next 600 months even.

@alexcrichton You say there's some sort of "organization onus" that's put on us if we break it up by target feature, but I don't understand what that onus would be. Every intrinsic that exists already exists in connection with a specific target feature. The entire RFC is predicated on the idea of these features. Every single intrinsic call is cfg-attribute gated so that it can't exist in a build targeting the wrong CPU and/or feature set. All of the organization is done for us by that alone.

Further, if you check the Intrinsics Guide you will note that each intrinsic feature set actually is exported by a different header. The fact that there's also a header to "grab everything" means that we just need a re-export module for folks who really do want it all dumped on them at once. However, the functionality itself would still live in different sub-modules.

I'm all for modules being "however big they need to be", but if every intrinsics now and forever lives in a single module it's just too much. The RFC itself already quotes the count at "thousands", and admits that there's so much code we might have to relax the normal stability guarantee and fix a bug here and there simply because there's so much code we don't fully trust ourselves to get it all correct the first time out.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2018

@Lokathor the organization onus is that we are the ones deciding to do so. Vendors are not designing intrinsics to be placed in modules (so it seems) but rather to get #include'd in C where everything is in one namespace. How would we assign modules to intrinsics that require multiple cpu features? What if Intel doesn't list a CPU feature for an intrinsic? Do we make our best guess?

Again the intention of this RFC is to expose anything at all on stable, not make it a 100% high quality interface for general consumption. It's expected something like that will show up on crates.io, not in libstd itself.

To that end we're looking for the lowest possible overhead to add these sorts of intrinsics to the standard library. Placing them in one module is not only easy for us to do but it also matches what vendors are expecting. This also should extend naturally to future architectures as well which aren't currently bound in stdsid.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

I don't like the separation in "feature" submodules either. For x86 it is already problematic due to some intrinsics requiring multiple features. Also, whether an intrinsic is available might depend on whether you are targeting a 32-bit or 64-bit x86 platform, and some other platform characteristics might apply as well (e.g. i586 vs i686). So getting this right is a lot of work, and after doing this work we really don't have any reliably way of knowing whether what we did is correct.

And x86 is the nice platform. For ARM we have "some" level of separation in stdsimd, but with arm vs armv7 vs aarch64 vs +/-thumb-mode vs +/- hard-float vs ... I don't even know if there is a separation into submodules that actually makes sense. We have tried to do so in stdsimd to reduce the amount of cfg macros we need internally, but while in x86 this more or less works, for ARM it is still a big unknown.

@alexcrichton one thing we could do is just expose the same headers that C does in submodules, for example:

  • std::arch::immintrin: for Intel Multi Media Intrinsics, equivalent to #include <immintrin.h>
  • std::arch::ammintrin: for AMD Multi Media Intrinsics, equivalent to #include <ammintrin.h>
  • std::arch::cpuid: for x86 cpuid intrinsics
  • ... etc, we mirror the publich headers exposed by gcc and clang, see here: https://github.com/llvm-mirror/clang/tree/master/lib/Headers (note that most of those headers are not public, e.g., all the avx headers can only be included via #include <immintrin.h> and not directly).

While we could do this, I find the reasons to do so weak at best. In a platform where an x86 intrinsic is not available, users will get a compiler error if they try to import the intrinsic. In a platform where the intrinsic is available, whether the intrinsic can or cannot be called depends on the machine where the binary will run on. Users that want to ensure portability should be using cfg-macros any ways to do either compile-time or run-time feature detection. Something like the portability RFC is a better solution than submodules here, in particular when submodule names can be misleading.


@alexcrichton it was mentioned in the internal threads that instead of using attributes + functions, const generics, etc. to declare intrinsics that take compile-time constants as arguments, we could just be using macros. With macros 2.0 users should be able to import these macros from std::arch, and if we ever get language support for compile-time constant function arguments, functions with the same name can be added alongside the macros and the macros deprecated. This alternative should maybe be mentioned in the RFC.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2018

@gnzlbg yeah I think organizing by header is possible but I don't think it buys us much. Sort of like libc I don't think we should organize by header but rather dump everything into the same namespace (which is what happens in C).

It's true yeah we could use macros (that's what C does I believe), but I'd personally prefer to stick to the solution we've got so far which with enough const machinery I think could one day be stable.

@rfcbot

This comment has been minimized.

Copy link

commented Feb 14, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

The `#[cfg]` attribute and `cfg!` macro statically resolve and **do not do
runtime dispatch**. Tweaking these functions is currently done via the `-C
target-feature` flag to the compiler. This flag to the compiler accepts a
similar set of strings to the ones specified above and is already "stable".

This comment has been minimized.

Copy link
@parched

parched Feb 14, 2018

What about #[target_feature]? Does this RFC define that it doesn't affect #[cfg] or does it leave it undecided? CC rust-lang/rust#42515

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Feb 14, 2018

Author Member

I believe that's going to continue to be an open bug

@JulianBirch

This comment has been minimized.

Copy link

commented Feb 19, 2018

This is a minor point, and I'm not even a Rust developer so this may be way off base, but I'm wondering whether you'd be better off having specific types for "Intel defined types". So void * is always voidstar and long long is always longlong rather than remembering which native type is mapped.

This was referenced Feb 24, 2018
@rfcbot

This comment has been minimized.

Copy link

commented Feb 24, 2018

The final comment period is now complete.

@alexcrichton alexcrichton referenced this pull request Feb 26, 2018
0 of 3 tasks complete
@alexcrichton

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2018

Ok! Now that FCP has elapsed it looks like nothing major has come up. I think there's still an open question as to where to place this in the standard library, but I'm going to merge this and we can of course continue to bikeshed on the tracking issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.