New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc produces indeterministic asm #57041

Open
bmwiedemann opened this Issue Dec 21, 2018 · 13 comments

Comments

Projects
None yet
6 participants
@bmwiedemann
Copy link

bmwiedemann commented Dec 21, 2018

While working on reproducible builds for openSUSE, I found that
the svgcleaner and exa rust packages had variations in machine code.

For exa, I traced this down to variations from git2-rs that is embedded there.
From that I reduced the code as much as I could, so that now there are only 183 lines of rust left.

Steps to Reproduce:

git clone https://github.com/bmwiedemann/git2-rs
cd git2-rs
git checkout race
./build.sh
for i in $(seq 20) ; do ./build.sh ; done | sort | uniq -c

Actual Result:

  8 26e2151e7f3593339976a02365fb0608  lib.s
 12 ad32d7431b7c1edb363d1f3238a3d0f5  lib.s

Expected Result:
rustc should always produce the same asm output

--- lib.s	2018-12-21 15:18:39.671245106 +0100
+++ lib.s-20181221	2018-12-21 15:18:28.115357401 +0100
@@ -450,8 +450,9 @@
 	movaps	%xmm1, _ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF+16(%rax)
 	movl	$1, %ecx
 	movq	%rcx, %xmm1
-	cmpq	$0, _ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF(%rax)
+	movq	_ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF(%rax), %rcx
 	movdqa	%xmm1, _ZN3lib5panic10LAST_ERROR7__getit5__KEY17h3776bf008a82988eE@DTPOFF(%rax)
+	testq	%rcx, %rcx
 	je	.LBB14_9
 	movq	%xmm0, %r14
 	testq	%r14, %r14

I have identified 6 places that can be dropped and make the output reproducible:

https://github.com/bmwiedemann/git2-rs/blob/race/src/panic.rs#L11
https://github.com/bmwiedemann/git2-rs/blob/race/src/lib.rs#L5
https://github.com/bmwiedemann/git2-rs/blob/race/src/lib.rs#L13 ++
https://github.com/bmwiedemann/git2-rs/blob/race/src/call.rs#L10 +4
https://github.com/bmwiedemann/git2-rs/blob/race/src/index.rs#L39 +8
https://github.com/bmwiedemann/git2-rs/blob/race/src/index.rs#L64 ++

This bug might be related to bug #50556, but that one is closed and this bug is still reproducible with rust-1.31

I checked that llvm-bc and llvm-ir output are reproducible even when asm is not.

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Jan 2, 2019

Spent some more time on this and did

for i in $(seq 20) ; do
  echo 10000 > /proc/sys/kernel/ns_last_pid
  taskset 1 setarch `arch` -R strace -s 4096 -o rustc$i.strace -f ./build.sh
done

This can create more reproducible results as long as the system is idle.
Switching firefox tabs created enough load for rustc to start varying results.
strace showed that the main rustc process spawns 4 threads, where thread 3 is short-lived and thread 4 is writing the lib.s

diffing strace outputs showed that the 'idle' result correlated with more, smaller mprotect calls and the 'busy' result correlated with fewer, larger mprotect calls.
There was also a difference in the 'brk' pattern in the main rustc process.

All this indicates that there is some raciness in the communication between the involved threads and this is where the non-determinism comes from.

rustc-strace-diff.txt

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Jan 2, 2019

Have you found any difference in the emitted LLVM IR?

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Jan 2, 2019

@jonas-schievink no, LLVM IR is always deterministic, as mentioned in the last line of the original report.

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Jan 2, 2019

Because of my very limited rust-foo I would appreciate help in further reducing the 183 lines of .rs files into an even smaller reproducer so that it becomes more visible what matters in there.

@jonas-schievink

This comment has been minimized.

Copy link
Member

jonas-schievink commented Jan 2, 2019

@bmwiedemann In that case, can you reproduce this by running llc on the emitted IR? If so, this is an LLVM bug.

All this indicates that there is some raciness in the communication between the involved threads and this is where the non-determinism comes from.

Does it make a difference if you pass -Ccodegen-units=1 or -Ccodegen-units=2 to the compiler? It really shouldn't matter here though, since the difference seems to be in instruction selection.

(this does indeed look like the same problem as #50556)

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Jan 3, 2019

I changed my build.sh to have --emit=asm,llvm-ir,llvm-bc.
Then running llc on lib.ll or lib.bc` indeed produced varying lib.s files.

I also tried -Ccodegen-units=$N and 1 behaved pretty much like before but 2 produced 2 output files that did not seem to vary for some reason.

I'm on llvm-5,0,1 atm. llvm-6.0.1 is also bad, but llc from llvm-7.0.0 seems to be good.
So rebuilding our rust with libllvm7 should help solve this.
Or we find the relevant llvm commit and backport it to the stable branches.

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Jan 3, 2019

I now ran my autoreduce script on lib.ll and shrank it to 176 lines:
https://github.com/bmwiedemann/git2-rs/blob/race/lib.ll
dropping line 140, 129, 107, 104, 84, 82, 80, 78, 76, 74, 70, 68, 66, or 44 make the result reproducible.

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Jan 9, 2019

In llvm I ran a git bisect between release_60 and release_70 and this turned up this fixing commit:
llvm-mirror/llvm@8957ed6
unfortunately it is hard to cherry-pick, as it requires 4+ other commits

@memoryruins

This comment has been minimized.

Copy link
Contributor

memoryruins commented Jan 18, 2019

cc #34902

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Jan 18, 2019

Is there anything actionable to do here on the side of rustc? Per your investigation this issue has already been fixed in LLVM 7.0.

@sanxiyn

This comment has been minimized.

Copy link
Member

sanxiyn commented Jan 18, 2019

I think this can be closed, once someone checks the original reproduction with a nightly build.

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Feb 1, 2019

Found another patch that makes llc determistic with lib.ll from the full git-rs
llvm-mirror/llvm@b2c8f01
do you think, llvm guys could backport those to 6.x or is that a waste of time?

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Feb 1, 2019

@bmwiedemann LLVM has short release cycles and LLVM 6 is long since dead. Not even LLVM 7 will receive further backports (there will only be one more release with a single change, that was not included in LLVM 7.0.1 because it is ABI breaking).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment