Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build the compiler with -Ctarget-cpu=x86-64-v2 #79043

Closed
wants to merge 2 commits into from

Conversation

est31
Copy link
Member

@est31 est31 commented Nov 14, 2020

This PR instructs rustbuild to compile the compiler with the -Ctarget-cpu=x86-64-v2 option enabled in hope of getting some optimization gains from autovectorization.

The PR also adds support for x86-64-{2,3,4} target CPUs by backporting an LLVM 12.0 commit.

I'm opening this to get a perf run to gauge the potential speedups on the rustc side. The LLVM side isn't built with the option enabled, as that would require clang 12.0 or manual enabling of the target features corresponding to the target CPU.

If the perf run shows up nice improvements one can talk about how to get this to users. One can't just enable this unconditionally for all users as it'd break for users of older CPUs. It's a similar question to #59667.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive
Copy link
Collaborator

⚠️ Warning ⚠️

  • These commits modify submodules.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 14, 2020
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-8 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
[command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
##[endgroup]
##[group]Fetching the repository
[command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=2 origin +bcf86308065550bb70968c6a63fd1ec3ad683328:refs/remotes/pull/79043/merge
---
   Compiling typenum v1.12.0
   Compiling version_check v0.9.1
   Compiling hashbrown v0.9.0
   Compiling getrandom v0.1.14
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
   Compiling either v1.6.0
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
error: could not compile `scopeguard`

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
warning: build failed, waiting for other jobs to finish...
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
'x86-64-v2' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "16" "--release" "--locked" "--color" "always" "--features" " llvm" "--manifest-path" "/checkout/compiler/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit code: 101
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:07:11

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

One thought though is that it probably makes more sense to go for -v4 or whatever a 3600X Ryzen corresponds to, to get a sense of maximum benefits from this kind of optimization. We can try that after we get an idea of what -v2 gives us, though.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 14, 2020

⌛ Trying commit 17b2818ca70ddad8c0a06ec63a96d0a1b58ec65d with merge 714dade877e50c10697708c0c2a77840ba58b69a...

@est31
Copy link
Member Author

est31 commented Nov 14, 2020

@Mark-Simulacrum wow that was a quick try issuance :). good point about the -v4. Should I change it? I wasn't sure which CPU the CI env uses. Apparently it's this one (extracted from the try build):

processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
stepping	: 4
microcode	: 0xffffffff
cpu MHz		: 2095.247
cache size	: 36608 KB
physical id	: 0
siblings	: 16
core id		: 15
cpu cores	: 16
apicid		: 15
initial apicid	: 15
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4190.49
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Which according to a rough glance on the tables corresponds to -v4.

Generally, figuring out which target cpu the cpu supports is probably another thing we need to figure out :). I'm not entirely sure what the failure mode is if one tries to run a binary compiled for -v4 on something that doesn't support it.

@bors
Copy link
Contributor

bors commented Nov 14, 2020

☀️ Try build successful - checks-actions
Build commit: 714dade877e50c10697708c0c2a77840ba58b69a (714dade877e50c10697708c0c2a77840ba58b69a)

@rust-timer
Copy link
Collaborator

Queued 714dade877e50c10697708c0c2a77840ba58b69a with parent 66c1309, future comparison URL.

@Mark-Simulacrum
Copy link
Member

Ultimately I would expect that if it builds at all in CI it's probably fine; I suspect that both CI and the 3600X we use on perf are sufficiently modern for -v4 (but who knows, I think 3600X doesn't support AVX512 for example?). If you want to switch this to v4 I can queue that as well.

@mati865
Copy link
Contributor

mati865 commented Nov 14, 2020

Zen 2 (e.g. Ryzen 3600X) is x86-64-v3 which could be still beneficial over -v2 if LLVM uses new instructions for hashing.

@Mark-Simulacrum
Copy link
Member

Ah, ok, then we can't check v4 on current perf but we can still check v3.

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
[command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
##[endgroup]
##[group]Fetching the repository
[command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=2 origin +c6d985906f40216fddc21a88ac0fb6a2259281db:refs/remotes/pull/79043/merge
---
configure: rust.channel         := nightly
configure: rust.debug-assertions := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
configure: 
---
Diff in /checkout/src/bootstrap/compile.rs at line 522:
     }
 }
 
-pub fn rustc_cargo(builder: &Builder<'_>, cargo: &mut Cargo, target: TargetSelection, compiler: Compiler) {
+pub fn rustc_cargo(
+    builder: &Builder<'_>,
+    cargo: &mut Cargo,
Running `"/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/rustfmt" "--config-path" "/checkout" "--edition" "2018" "--unstable-features" "--skip-children" "--check" "/checkout/src/bootstrap/compile.rs"` failed.
If you're running `tidy`, try again with `--bless`. Or, if you just want to format code, run `./x.py fmt` instead.
+    compiler: Compiler,
+) {
     cargo
     cargo
         .arg("--features")
         .arg(builder.rustc_features())
Diff in /checkout/src/bootstrap/compile.rs at line 531:
     rustc_cargo_env(builder, cargo, target, compiler);
 
 
-pub fn rustc_cargo_env(builder: &Builder<'_>, cargo: &mut Cargo, target: TargetSelection, compiler: Compiler) {
+pub fn rustc_cargo_env(
+    builder: &Builder<'_>,
+    cargo: &mut Cargo,
+    compiler: Compiler,
+) {
+) {
     // Set some configuration variables picked up by build scripts and
     // the compiler alike
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --stage 2 src/tools/tidy
Build completed unsuccessfully in 0:00:14
== clock drift check ==
  local time: Sat Nov 14 15:53:43 UTC 2020

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 14, 2020

⌛ Trying commit c20c5ca with merge 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a...

@bors
Copy link
Contributor

bors commented Nov 14, 2020

☀️ Try build successful - checks-actions
Build commit: 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a (08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a)

@Mark-Simulacrum

This comment has been minimized.

@rust-timer

This comment has been minimized.

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Nov 14, 2020

It looks like we need to wait for the current build to finish before starting a new one, though I'm not sure why that is limited in the db, so probably can be removed as a constraint in the future.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (714dade877e50c10697708c0c2a77840ba58b69a): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@Mark-Simulacrum
Copy link
Member

@rust-timer build 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a

@rust-timer
Copy link
Collaborator

Queued 08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a with parent 30e49a9, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (08c4dbcd0f8da8bc09173074ab9eedcaa8336d8a): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@Mark-Simulacrum
Copy link
Member

V3 looks like a pretty significant wall time loss, V2 looks like a slight win (or perhaps lost in the noise). My guess is that this is not worth it at this time.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 14, 2020
@est31
Copy link
Member Author

est31 commented Nov 15, 2020

Yeah there are some single digit improvements in instruction counts (up to -2.5% in inflate-check full for v3, up to -6.6% in keccak-debug full) but I guess that's due to what instruction set extensions are about :). But if you click at the passes overview it in fact shows a regression in the summary, not an improvement. Different measurement methods? Most times it shows a slight regression in the time delta. In fact, the pass overview is quite useless as the passes are all over the place. Some improve, others regress, sometimes quite heavily. Tons of noise there. I'd have wanted to identify passes where the instruction count was heavily reduced to check whether they might be a good place for vectorization, but the pass overview is useless for that :).

I think what the compiler is doing doesn't lend itself that well to being sped up by target extensions because mostly they are about bulk processing of data.

Maybe it's more power efficient now, maybe not, but even if, it's likely not enough to warrant further inspections.

On the bright side, optimizing LLVM is still left unexplored. Also, the LLVM commit I backported is only half of the story: LLVM 12.0 will also gain ability to tune for CPUs, like gcc's -mtune. It will still run on older CPUs but instructions are emitted in a way to run faster on newer ones. Originally the commit I backported built on that feature to also set default tunings for different CPUs for v2, v3, v4, but I removed it to not having to backport the -mtune changes as well. So maybe we can repeat this test in the future once LLVM 12.0 is around with the proper version of the commit. Maybe one can also experiment with tuning the CPU by that time. I think that's best done once LLVM 12.0 is around and rustc uses it. Even better if the CI uses that as compiler for the native LLVM.

For now though, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants