Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upDo not use SIMD instructions on i686 #31110
Conversation
rust-highfive
assigned
nikomatsakis
Jan 22, 2016
This comment has been minimized.
This comment has been minimized.
|
(rust_highfive has picked a reviewer for you, use r? to override) |
ranma42
referenced this pull request
Jan 22, 2016
Closed
rustc crashing over illegal instruction #14441
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 22, 2016
|
Clearly you weren't willing to understand the issue and went ahead with this blunt PR. Even though it was I who kept bugging the team about ISA compatibility, I never dreamed of getting everyone off an I'll address the purely technical part then:
|
This comment has been minimized.
This comment has been minimized.
|
While "i686" is the wrong name for the target currently so-called (32 bit but with reasonably modern instruction set extensions), that target is important and should be the default. SSE2 is very wildly available and has several benefits (smaller and more efficient copying of mid-sized structs, more predictable and generally more accurate float arithmetic, the ability to benefit from autovectorization, and possibly more that I'm forgetting). Certainly a pre-P4 target should exist, but I don't think it should be the default. Perhaps the current "i686" target should be renamed, but this is tricky and will require a long grace window before flipping the switch on the old ("i686") name. |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 22, 2016
|
@brson In conjunction with adjustable |
This comment has been minimized.
This comment has been minimized.
|
@petevine I would be very surprised if LLVM changed the definition of i686. Let me state it again: LLVM/Clang does not assume that i686 is What is the problem with a non-SSE2 compiler/library? @rkruppe In #14441 I also suggested an alternative, that is splitting the 32-bits x86 targets in |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 22, 2016
|
Hell, what fun arguing the opposite! There were no erroneous assumptions and no clang factor so you've created a strawman. Theirs was a conscious decision on the rust team's part, one that I personally didn't like for the sole reason of not being able to build from source on older machines (stage0 snapshots are P4 too). And the fact neither the downloads page, nor the rust build system warn you about it! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@petevine I find it much more exhausting to read your rather heated replies than arguing this topic, and I don't like the topic very much. I have no moderation duties or powers, I'm just asking as another lowly contributor: Please be more charitable. @ranma42 As you point out, there is no serious evaluation of the performance (that I'm aware of). But I do see good reasons to assume there will be measurable performance impact on several kinds of code for reasons outlined above. The reason I am suggesting a long-ish grace window (by that I mean a release cycle or two) is twofold:
|
This comment has been minimized.
This comment has been minimized.
|
I don't feel qualified to review or not review this PR. @alexcrichton or @brson seem like more logical choices. Would any of you like to volunteer? |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 22, 2016
This comment has been minimized.
This comment has been minimized.
|
@rkruppe The worry about possible performance regression is justified (most vectorisation opportunities would be lost). In general purpose code (like rustc) I would expect minor changes, but vectorisation-intensive libraries (something like BLAS) would probably show some changes. I will try to do some benchmarking on the rust build itself (compiler + libraries) and on the shootout (and I would be willing to try other benchmarks, if anybody can suggest some that would be particularly significant). I would expect that this change should not have a visible fallout except on tools relying on the specific opcodes emitted by rustc... but it's better safe than sorry, so this is definitely a change whose impact I would like to discuss, evaluate and test extensively before it is applied. |
brson
assigned
brson
and unassigned
nikomatsakis
Jan 22, 2016
This comment has been minimized.
This comment has been minimized.
|
Thanks for the PR @ranma42. This is a tricky question that keeps coming up. There are a few factors at play that I'm aware of.
One thing I'm not clear on: Distros use compatible i686 code for the binaries they distribute but does gcc emit better code when run by users? @anguslee @sylvestre how is Debian now getting around this problem of rustc i686-unknown-linux-gnu using the wrong defaults for your system? Regardless of whether we change this triple it seems the need to tweak cpu settings comes up often enough that there should be a more convenient way to do it from Cargo (cc @alexcrichton). cc @dotdash since you touched this last. |
brson
referenced this pull request
Jan 22, 2016
Closed
default flexible targets path to /etc/rustc/ #31117
This comment has been minimized.
This comment has been minimized.
|
@brson yeah I definitely agree that this sort of minor configuration tweak comes up quite often. I think that your I've never really known what |
This comment has been minimized.
This comment has been minimized.
|
Regarding |
This comment has been minimized.
This comment has been minimized.
|
I put some of the outputs of GCC and Clang (versions 4.8.4 and 3.4, from an Ubuntu LTS 14.04.3 VM) for different options in this gist (nb: the versions I tested are quite old, it might make sense to check the latest gcc and Clang compilers, but I had this VM ready for testing) Clang defaults to generating code for more modern processors, but with GCC seems more conservative about the instruction set used by default (no SIMD) and even with @brson "Does gcc emit better code when run by users?" |
This comment has been minimized.
This comment has been minimized.
|
AFAICT, rustc behaves just like clang. Using The observation in #14441 that the LLVM binaries don't use SSE instructions likely stems from the fact that LLVM was compiled with gcc, which does default to i686 compatibility. Building LLVM with clang results in code that does use SSE instructions. Also, on my Debian system, and and libc++ from the i386 pool contains SSE instructions, so a rustc built against that wouldn't work on a non-SSE machine either. Given all that, I think a step in the right direction here would be to have a less obscure way than using Whether or not we want to provide a "true" i686 rustc built, either as the default or in addition to what we have now, I don't know. If someone could do some benchmarks, that would be nice. If nobody volunteers, maybe I can do it sometime next week. I think it would be mostly interesting to have rustc bootstrapped for i686, but the benchmark built with a P4 target CPU, to actually see how much performance is lost by having the distribution target older CPUs as a baseline. @petevine you said that you have a solution ready, could you elaborate on that? I probably missed a number of details here. |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 24, 2016
|
@dotdash I wasn't going to bother but yours is the first fully competent post in this thread so I must oblige! The solution, considering there are probably fewer than a dozen people still using non-SSE2 machines and Rust (myself included), should be a source-only one:
Apart from that, even if the status quo remains, the downloads page should make it clear SSE2 is required and in case someone starts a naive source build, the snapshot should get tested immediately and not after 5 hours of building LLVM. I tried Golang not long ago and that's how you bootstrap an i386 ( |
This comment has been minimized.
This comment has been minimized.
When @pnkfelix was calling you out for being rude, this is what he was talking about. I know that you are frustrated. Please stop taking it out on others. It's not appropriate here. |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 24, 2016
|
You're probably right - the jab wasn't necessary. Apologies everyone! Once again, great post @dotdash! and @nikomatsakis, that's some wisdom straight from Pirkei Avot! |
This comment has been minimized.
This comment has been minimized.
|
@dotdash I confirm that rebuilding Rust (and its LLVM repo) on the same machine with Clang results in LLVM binaries which use SSE. We might want to ensure that bootstrapping from gcc and Clang results in equivalent binaries (from the point of view of target instruction set). As you mentioned, it would be very convenient if there was a way to pass the appropriate flags to all of the components. It would make it easier to support older targets, but it would also be useful if somebody wanted to sacrifice compatibility in order to make use of the newest operations available on its machine (basically by building everything with I installed Debian jessie i386 the libc++1 package includes SIMD instructions, but other binaries, including libstdc++ and Clang seem to use x87 instructions and no SIMD operation at all. I wonder if this is intentional or just a consequence of the fact that the libc++1 package is built with clang. |
This comment has been minimized.
This comment has been minimized.
Isn't this not just a problem of bootstrapping though? The compiled rustc is going to proceed to output unusable binaries when used afterward. |
This comment has been minimized.
This comment has been minimized.
Presumably this would actually permanently override the target spec for i686 targets, so that any time you compiled with one one would get no SSE2 instructions.
@dotdash I think there is a difference though, in that rustc understands target-triples (I'm guessing gcc/clang don't accept them directly but not sure), and when you pass an 'i686' triple to rustc you might reasonably expect it to behave like |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Jan 27, 2016
|
@brson Indeed, the idea is to leave SSE (1) on the table for the most usable cpus, namely P3's and Athlons, without having to resort to BTW, the fastest possible 32-bit code (even on a P4) would probably be achieved this way in LLVM:
|
This comment has been minimized.
This comment has been minimized.
The compiled rustc will produce usable binaries when used with an appropriate
Using I'm not saying that this is necessarily the right way to do things, but given how old those CPUs are, defaulting to a more "recent" set of features and requiring developers targetting old hardware to explicitly specify their target seems like a reasonable choice to me. |
alexcrichton
added
the
T-tools
label
Jan 27, 2016
MagaTailor
referenced this pull request
Feb 9, 2016
Closed
x86(64) runtime performance irregularities #31503
This comment has been minimized.
This comment has been minimized.
|
I suggest we do this:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton Why just @brson I am afraid that such naming might be misleading. If I understood it correctly, you are suggesting that the I think the Clang defaults are not particularly good choices and I would rather use the more conservative (and compatible) defaults of the GNU compilers/toolchain. For the record, even Clang follows the gcc conventions in some cases, such as Android, while In addition to compatibility concerns, keeping If GNU conventions (use most generic CPU for target arch) are considered impractical, I would try to go for the ones proposed by Clang. If changing the existing targets is unfeasible, neither GNU nor Clang triples can be used and we are bound to define a new (and incompatible, In any case, I would at least try to ensure that a way to build a non-SSE version (or whatever is needed by distros) is well-known and tested. This is desirable anyway if distros start packaging Additionally, I would love if there was some more information about the behaviour of |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Feb 11, 2016
|
@alexcrichton @brson Once there, to have the ecosystem ready, a few additions like this one alexcrichton/curl-rust@a1e76ec will be necessary. From my experience with the new |
This comment has been minimized.
This comment has been minimized.
@ranma42 OK, good points. Is there another solution you like other than removing sse from the i686 targets that allows people to create compilers to target true i686es without patching the source?
@petevine How does mod.rs need to be updated, and which mod.rs? Right now @alexcrichton and I do not want to change the code generation of snapshots. Mostly because we're moving away from snapshots for bootstrapping and toward official releases. We're hoping that those that need these compilers, like distros, will be ok with building them themselves from a machine that can run sse. |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Feb 11, 2016
|
@brson There's definitely going to be no problem/slowdown using the |
This comment has been minimized.
This comment has been minimized.
|
@ranma42 I was thinking that for now we can probably just add From what you're saying, though, the difference of Also, with regards to mirroring clang and where all this came from, you're definitely more than welcome to add some documentation! I suspect the workflow for the initial integration of this change look like:
Which I think may help explain why our defaults may differ from Clang in a few places (but they probably shouldn't). Does that make sense? |
This comment has been minimized.
This comment has been minimized.
|
@petevine yes to officially support a new triple like this we would need to produce both nightly and snapshot compilers, but we currently don't produce nightlies beyond tier 1 platforms (which this wouldn't be initially), so we would probably support community-bulit snapshots/nightlies in the near future for any new target added. |
This comment has been minimized.
This comment has been minimized.
MagaTailor
commented
Feb 12, 2016
|
@alexcrichton |
This comment has been minimized.
This comment has been minimized.
Sure :) I like @dotdash suggestion of having a unified way (configure argument?) to pass the appropriate flags when building each component. Actually, it looks like a good idea independently from this issue.
Among those supported by rust,
The data for clang has been collected running
I will start by adding comments in the code which provide the same information I collected in the table above. Providing this information to users through something like "-###" would be awesome, but it certainly involves much more significant changes.
Yes, that looks like what happened. |
This comment has been minimized.
This comment has been minimized.
|
I'd be fine merging a PR to align all our targets with whatever the Clang default are (e.g. fix the discrepancies you've found here). Unfortunately that still doesn't quite solve the inital problem in this PR (there's no target for no-sse instructions), but I guess if we soup up the build system somehow we could fix that. |
ranma42 commentedJan 22, 2016
SIMD instructions are not available on all of i686 processors and
cause programs to terminate on illegal instruction on older
processors.
Clang defaults to compiling for
pentium4when targeting a generic 32 bits x86 architecture. This was used as a precedent for rustc, but I believe some details were missed when doing so:I think we should ensure that it is actually possible to target plain i686 with rust.
I expect this to come up with distros which target i686.
This should fix #14441 (so far I only verified with objdump that SIMD instructions are not used).