Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upSIMD groundwork #1199
Conversation
huonw
added some commits
Jul 7, 2015
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Jul 10, 2015
|
X-posting my late comment from the pre-RFC: What I was saying was a bit more than that.
And yeah, I edited "alignment voodoo" into my post before you mentioned that Anyway, the result of the above is that one only really needs two changes to the compiler:
But as far as benefits, this avoids a large mass of worryingly magical (regarding parameter types) intrinsics being added to the compiler. |
nikomatsakis
added
the
T-lang
label
Jul 10, 2015
This comment has been minimized.
This comment has been minimized.
Unfortunately if we want to get SIMD on stable Rust this would necessitate a stable
I believe a core aspect of this RFC is that it's stabilizing the absolute bare minimum of what the compiler needs to support SIMD. If we add in a few language traits and other various types here and there it's more surface area that will have to be stabilized. If this stuff can be built externally in a library, that'd be great! This RFC, however, is just focused on the compiler support. |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Jul 10, 2015
Sure! I just think that the way it makes the intrinsics unify types with sufficiently similar layouts under Rust currently has a nominal type system rather than a structural one; some subset of intrinsics magically ignoring that feels very questionable to me. |
This comment has been minimized.
This comment has been minimized.
Hm, maybe I haven't been clear. They're not really ignoring it. Suppose we have: #[repr(simd)]
struct A(f64, f64);
#[repr(simd)]
struct B(f64, f64);
extern {
fn some_simd_intrinsic(x: A);
}It's not legal to call (That said, it's slightly different, since the compiler doesn't enforce that arguments have the right layout for C functions (has to be done by the programmer), whereas the SIMD intrinsics do have layout-enforcement.) |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Jul 10, 2015
|
Ah, that does make rather more sense, and is not the impression I had gotten. |
This comment has been minimized.
This comment has been minimized.
main--
commented
Jul 11, 2015
The Motivation section mentions how this RFC aims to provide just some ground-work on top of which nice SIMD functionality could be built. While builtin arithmetic, shuffles etc for
I have a bad feeling about this. A regular method call shouldn't require parameters to be compile time constants. Using generics to express this requirement as shown here depends on #1062, but it would be a much cleaner solution.
Wouldn't a compile error be nicer here? That aside, when reading this RFC I had the same thought as @eternaleye: Why implement compiler magic when these functions could be implemented in plain rust with Additionally, inline asm allows the programmer to influence things like instruction scheduling and register allocation (within the asm section), in case the compiler is doing a bad job in that regard. So I'd suggest solving the "Operations" section of the RFC in a way that doesn't require any compiler changes (at least not specifically for SIMD). I'm unsure about what the People often argue that vectorization is best left to the compiler and as Rust uses LLVM, many simple cases can benefit from great optimizations. But unfortunately optimizers aren't perfect and simply unable to handle sufficiently complex code, so obviously an explicit way for programmers to express how they want code to be vectorized is necessary. Somewhere above, I suggested removing even those basic operations from So to summarize, I'm in favor of simplifying/breaking up the "types" and the "operations" sections of the RFC by building on top of much less specific compiler features and pushing as much work as possible into the to-be-written SIMD crate (and moving the "platform detection" section into its own RFC). This feels appropriate as the RFC's intention was just basic groundwork. Or in other words: I'm arguing that "the absolute bare minimum of what the compiler needs to support SIMD" is zero. The missing parts aren't necessarily SIMD-specific. |
This comment has been minimized.
This comment has been minimized.
eternaleye
commented
Jul 11, 2015
@huonw, did I miss anything? ISTR you mentioning it didn't validate things like that they need to be homogenous, and that such things would be left up to |
This comment has been minimized.
This comment has been minimized.
|
Arithmetic identity-based optimizations apply just as well to SIMD operations as they do to non-SIMD operations. By using the intrinsics as opposed to inline asm we give LLVM the ability to do those optimizations. |
This comment has been minimized.
This comment has been minimized.
main--
commented
Jul 11, 2015
|
@pcwalton Oh, I didn't know that! Yes, that's a big advantage of the intrinsics then. |
This comment has been minimized.
This comment has been minimized.
|
Concerning the structural typing when importing the intrinsics: Please be careful that this does not end up allowing people to "peek through" private abstractions of data-types. That would be a horrible mess of a safety issue. Essentially, such merely structural typing should only be allowed if the module had access to all the fields of the type anyway: Either because they were all public (all the way down), or because the type was defined in the same module. |
This comment has been minimized.
This comment has been minimized.
|
Just a note to all the people that keep mentioning inline assembly: |
nrc
assigned
huonw
Jul 13, 2015
This comment has been minimized.
This comment has been minimized.
I agree that it isn't totally necessary to actually use the arithmetic operators: we could instead use a generic intrinsic similar to the comparison operators. However, I think it is important we do more than the platform intrinsics: LLVM (and compilers in general) knows more about its internal For shuffles, the optimisation applies: the compiler can e.g. simplify a sequence of shuffles into a single one. Also. the RFC discusses this. One point in it is the compiler synthesizing an optimal (/close to optimal) sequence of instructions for an arbitrary shuffle, instead of forcing the programmer to think about doing that themselves.
This isn't a regular method call: intrinsics are special in many ways. Note that my solution on #1062 that you link to just calls the intrinsic. This is the low-level API, people generally won't be calling the intrinsics directly.
Yes, sort of. However, using the trick mentioned in #1062 would result in very poor error messages, since the shuffle order may be passed through multiple layers of generic function calls possibly in external crates, meaning the out-of-bounds error is generated deep inside code that the programmer didn't write.
As other have said,
Neither of these apply to this: the API is essentially exposing individual CPU instructions, i.e. each These reasons apply if one was, say, writing an entire inner loop as a single
Yes.
It sort-of does, but in a very very restricted way, that's already possible with
Tuples and arrays don't have the right low-level details.
|
This comment has been minimized.
This comment has been minimized.
Transmute requires unsafe. It shouldn't be possible for safe code to violate abstraction boundaries. What you are proposing (if I follow your RFC correctly) is essentially that The restriction that the type used for the intrinsic has to be defined in the same module, shouldn't be a problem for the implementations you envision (with some crate(s) taking care of providing a descent abstraction) , right? |
This comment has been minimized.
This comment has been minimized.
|
Hm, I misunderstood what you were talking about. I'm unsure what the problematic situation you're envisioning could be. Is it something like: Crate (I'm not against the privacy restriction, I'm just trying to understand the motivation more concretely.) |
This comment has been minimized.
This comment has been minimized.
Is a lang item the best intuition here? Wouldn't a closer analogy be
Would it be possible to just make this best-effort, and fall back to laying the type out normally if it doesn't meet the SIMD requirements (perhaps with a warning)? That seems cleaner than implicit requirements at the code generator level, which feels like a contract or layering violation (if it passes the typechecker, it should compile), and perhaps more in the spirit of
I wonder if the fact that the borrow checker enforces limits on observability wouldn't actually let us support the interior reference-taking, just in a less efficient way, by first copying the field out onto the stack when taking a shared In this way it would be even more truly the case that
I might be more comfortable with this if you had to explicitly write |
This comment has been minimized.
This comment has been minimized.
This essentially means its much less of an issue than I thought.
Which part is unclear - whether these rules allow code to get around the restriction that usually apply to private fields, or whether getting access to private fields is an issue? The answer to the latter is that it violates parametricity - for now I'll just assume that as accepted. Please tell me if I should elaborate on that. Unsafe code (through transmute) can violate parametricity anyways, but I would still prefer if no additional violations would be introduced. Regarding the first part, let me try to come up with some examples. I assume "A::Simd" is some other crate's |
This comment has been minimized.
This comment has been minimized.
|
Ah, I think you might be working with the same misunderstanding as @eternaleye, that the intrinsics can be called with any type that matches structurally, it's just that they can be declared with any type that matches structurally. See #1199 (comment) . (I 100% agree that being able to write a transmute via SIMD intrinsics would be unfortunate.) |
This comment has been minimized.
This comment has been minimized.
|
I had that misunderstanding at first, but the post above was written without that assumption. The first one, regarding add, can't I declare this with |
This comment has been minimized.
This comment has been minimized.
I agree that
I think we can relax this in future if we find the RFC doesn't work well in practice. (It's part of the reason I proposed a hard error.)
Interesting idea, however it seems relatively complicated, and not worth it for SIMD: efficient SIMD code won't be handling/mutating individual elements like this very much.
Another alternative is just providing arithmetic intrinsics.
Oh, I see. It seems sensible to disallow it. I.e. have nominal equality constraints within a definition. And for the Shuffle is generic because it can be used with literally any SIMD type, i.e. it's not restricted to some subset of types with the same structure. |
huonw
added some commits
Aug 12, 2015
This comment has been minimized.
This comment has been minimized.
|
Updated. Highlights:
|
bors
added a commit
to rust-lang/rust
that referenced
this pull request
Aug 15, 2015
bors
added a commit
to rust-lang/rust
that referenced
this pull request
Aug 15, 2015
bors
added a commit
to rust-lang/rust
that referenced
this pull request
Aug 15, 2015
bors
added a commit
to rust-lang/rust
that referenced
this pull request
Aug 17, 2015
bors
added a commit
to rust-lang/rust
that referenced
this pull request
Aug 17, 2015
ranma42
reviewed
Aug 18, 2015
| shuffles are exposed generally: intrinsics that represent arbitrary | ||
| shuffles. | ||
|
|
||
| This may violate the "one instruction per instrinsic" principal |
This comment has been minimized.
This comment has been minimized.
durka
referenced this pull request
Aug 27, 2015
Closed
Anomalous unstable feature annotations #28037
This comment has been minimized.
This comment has been minimized.
|
Hear ye, hear ye. This RFC is entering final comment period. |
nikomatsakis
added
the
final-comment-period
label
Sep 4, 2015
This comment has been minimized.
This comment has been minimized.
|
What's the basic plan with respect to stabilization of these features? As long as the idea is just to get these into the unstable compiler, so that the implementation PR can land and we can start experimenting more seriously with higher-level APIs, further evolving the lower-level ones, and whatever else, that's completely cool. But the number of places (there's a few) where we're currently forced to say "enforcing this in the type system is difficult-to-impossible right now, so let's punt it to the backend" still bothers me if/when it's something we'd be committing to support as a stable feature forever (or until 2.0, anyways). |
This comment has been minimized.
This comment has been minimized.
|
@glaebhoerl I think there is no plan to stabilize until experience has been gained, but of course the further we go down this road, the less likely we'll back up and start from another. |
This comment has been minimized.
This comment has been minimized.
|
Basically, I suspect the approach I'd prefer is that once we have this infrastructure, and we've used it to gain some experience and to figure out the best way to formulate higher-level "type-safe" (i.e. without checks deferred to codegen) abstractions for SIMD (various |
This comment has been minimized.
This comment has been minimized.
|
To be clear, I agree with you Gabor -- I'd like to gain experience in what On Thu, Sep 10, 2015 at 5:45 PM, Gábor Lehel notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
Huzzah! The language subteam has decided to accept this RFC. |
nikomatsakis
merged commit 67fea6e
into
rust-lang:master
Sep 18, 2015
This comment has been minimized.
This comment has been minimized.
|
Tracking issue is rust-lang/rust#27731 |
huonw commentedJul 10, 2015
•
edited by mbrubeck
Rendered.