Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible builds: non-deterministic use of cmpq #50556

Closed
kpcyrd opened this issue May 9, 2018 · 12 comments
Closed

reproducible builds: non-deterministic use of cmpq #50556

kpcyrd opened this issue May 9, 2018 · 12 comments
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kpcyrd
Copy link

kpcyrd commented May 9, 2018

$ rustup run nightly rustc --version
rustc 1.27.0-nightly (428ea5f6b 2018-05-06)
$ rustup run nightly cargo --version
cargo 1.27.0-nightly (af3f1cd29 2018-05-03)

I had issues building a deterministic binary for one of my projects, this is the reprotest command I ran:

reprotest -vv --vary=-all --store-dir=artifacts/ --source-pattern 'Cargo.* src/' '
    CARGO_HOME="'$HOME'/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
        RUSTFLAGS="--remap-path-prefix=$HOME=/remap-home --remap-path-prefix=$PWD=/remap-pwd -Ccodegen-units=1" \
        cargo +nightly build --release --verbose --locked' \
    target/release/badtouch

by setting -Ccodegen-units=1 I could considerably reduce the size of the diffoscope output (let's ignore this for now), but the build was still non-deterministic, the root cause being 2/3 instructions:

First build

    -   c1f5f:  48 83 b8 e0 fd ff ff     cmpq   $0x0,-0x220(%rax)
    -   c1f66:  00
    -   c1f67:  66 0f 7f 80 e0 fd ff    movdqa %xmm0,-0x220(%rax)
    -   c1f6e:  ff

Second build

    +   c1f5f:  48 8b 88 e0 fd ff ff     mov    -0x220(%rax),%rcx
    +   c1f66:  66 0f 7f 80 e0 fd ff    movdqa %xmm0,-0x220(%rax)
    +   c1f6d:  ff
    +   c1f6e:  48 85 c9               test   %rcx,%rcx

This causes a bunch of other differences since the first version is two bytes shorter.

I'm not sure if this decision is made by llvm or rustc, but it seems this happens in a non-deterministic way.

The project I'm using to test this is rather large, sadly I can't provide an isolated example. It also seems this doesn't affect every project.

Screenshot (easier to read)

diffoscope output

cc: @infinity0

@bmwiedemann
Copy link

I'm seeing the same cmpq diff with rust-1.26.2 on openSUSE with several of our packages:

  • cargo
  • exa
  • librsvg
  • svgcleaner

looking closer at exa

        movdqa %xmm1,%fs:0xffffffffffffff90
        mov    $something,%eax
        movq   %rax,%xmm1
-       cmpq   $something,%fs:0xffffffffffffff80
+       mov    %fs:0xffffffffffffff80,%rax
        movdqa %xmm1,%fs:0xffffffffffffff80
+       test   %rax,%rax
        je     <_ZN3std9panicking20rust_panic_with_hook17h33ee6001a18cb890E + ofs>

and again with 3exa4main
the *svg* packages have it with the rust_panic_with_hook
cargo has it before _ZN4git25panic5check17ha7ce59d73233c9dcE
but as it is random, it could be that there are other places that don't always differ.

That is some pretty obvious pattern.
Has anything be done for this issue yet?

@bmwiedemann
Copy link

maybe related:
random ordering of processing of libs - does cargo use randomized hashes? And is it possible to use a fixed has seed from an environment variable?

e.g. for exa

  + cargo build --release -j1
     Compiling winapi-build v0.1.1
-    Compiling num-traits v0.1.40
     Compiling libc v0.2.30
-    Compiling gcc v0.3.53
-    Compiling rustc-serialize v0.3.24
-    Compiling pkg-config v0.3.9
+    Compiling num-traits v0.1.40
     Compiling winapi v0.2.8
+    Compiling pkg-config v0.3.9
+    Compiling rustc-serialize v0.3.24
+    Compiling gcc v0.3.53
     Compiling matches v0.1.6
+    Compiling unicode-normalization v0.1.5
     Compiling unicode-width v0.1.4
     Compiling nom v1.2.4
-    Compiling unicode-normalization v0.1.5
-    Compiling utf8-ranges v0.1.3
     Compiling percent-encoding v1.0.0
+    Compiling utf8-ranges v0.1.3
     Compiling regex-syntax v0.3.9
     Compiling byteorder v0.4.2
-    Compiling log v0.3.8
     Compiling bitflags v0.9.1
+    Compiling log v0.3.8
     Compiling getopts v0.2.14
-    Compiling ansi_term v0.8.0
-    Compiling lazy_static v0.2.8
+    Compiling glob v0.2.11
     Compiling natord v1.0.9
+    Compiling ansi_term v0.8.0
     Compiling scoped_threadpool v0.1.7
-    Compiling glob v0.2.11
+    Compiling lazy_static v0.2.8
     Compiling kernel32-sys v0.2.2
-    Compiling num-integer v0.1.35
-    Compiling number_prefix v0.2.7
     Compiling rand v0.3.16
     Compiling memchr v0.1.11
     Compiling locale v0.2.2
-    Compiling term_size v0.3.0
     Compiling users v0.5.3
+    Compiling term_size v0.3.0
     Compiling num_cpus v1.6.2
-    Compiling cmake v0.1.25
+    Compiling num-integer v0.1.35
+    Compiling number_prefix v0.2.7
     Compiling num-complex v0.1.40
     Compiling libz-sys v1.0.16
+    Compiling cmake v0.1.25
     Compiling unicode-bidi v0.3.4
     Compiling pad v0.1.4
     Compiling term_grid v0.1.6
     Compiling iso8601 v0.1.1
-    Compiling num-iter v0.1.34
-    Compiling num-bigint v0.1.40
     Compiling aho-corasick v0.5.3
+    Compiling num-bigint v0.1.40
+    Compiling num-iter v0.1.34
     Compiling libgit2-sys v0.6.14
     Compiling idna v0.1.4
     Compiling num-rational v0.1.39

but OTOH each position in the list varies only by a bit, so maybe it is using threads internally even on a 1-core VM and does something racy.

@kennytm
Copy link
Member

kennytm commented Jul 25, 2018

@bmwiedemann Yes cargo uses a HashMap in its dependency queue. The Rust standard library uses a TLS random seed in a HashMap.

@bmwiedemann
Copy link

@kennytm is there a way to get cargo to use a constant order? even if it is just to see if it makes a difference...

@jonas-schievink
Copy link
Contributor

If it's enough to compile the final crate, you can grab the rustc invocation using "cargo build -v" and rerun that

@kpcyrd
Copy link
Author

kpcyrd commented Jul 27, 2018

@bmwiedemann the order of the crates doesn't matter, I currently have reprotest tests pass for a different binary that also compiles its dependencies out-of-order, it seems this is not the source of the problem.

https://travis-ci.org/kpcyrd/sniffglue/jobs/407405847

@infinity0
Copy link
Contributor

The rust compiler uses a deterministic hash table, I believe it's called FnvHashmap or something similar. Any non-deterministic hashtables should be changed to use this one instead.

@infinity0
Copy link
Contributor

See other discussion in #34902

@XAMPPRocky XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-codegen Area: Code generation labels Oct 2, 2018
@mnd
Copy link
Contributor

mnd commented Oct 6, 2018

I looking at way to make rust 1.27.2 package build reproducible in GuixSD distro and I got same issue. It's constantly happen in rlib files that was build for rustc-1.27.0-src/src/vendor/git2 and rustc-1.27.0-src/src/vendor/curl. This issue ends up as non-reproducible cargo binary.

For build I used external pre-build llvm 6.0.1 package. Rust 1.26.2 build is reproducible with same configurations.

For build I use next config.toml and next environment environment.txt

Sources was built with next commands:

./x.py build
./x.py build src/tools/cargo
./x.py install
# modify "prefix" in config.toml to install cargo application to different package
./x.py install cargo

As result I got different code for curl::panic::catch in src/vendor/curl, for git2::packbuilder::PackBuilder::insert_tree, git2::patch::Patch::num_lines_in_hunk, git2::patch::Patch::line_in_hunk, git2::patch::Patch::print, git2::patch::Patch::to_buf in src/vendor/git2 and for some parts of src/librustc code.

Do you know way how I can try to debug it or if there was fixes for this behavior in newer versions of Rust?

@mnd
Copy link
Contributor

mnd commented Oct 12, 2018

Small update: I see same issue during rust 1.25.0 compilation with llvm 6.0.1. Both 1.25.0 and 1.27.2 rust releases compiled in reproducible manner with llvm 3.9.1. Rust 1.26.2, 1.28.0, 1.29.1 compilation reproducible with llvm 6.0.1.

@kpcyrd
Copy link
Author

kpcyrd commented Oct 30, 2018

I've tried again using rust 1.30.0 and my initial test case now seems to work. I'm trying to increase variations next to check if my project is fully reproducible now and I'm also going to try with exa, which @bmwiedemann reported having the same issue.

@kpcyrd kpcyrd closed this as completed Nov 6, 2018
@bmwiedemann
Copy link

I'm still getting such diffs in exa-0.8.0 with rust-1.30.0 in openSUSE, even when varying as little as possible:

        movdqa %xmm1,%fs:0xffffffffffffff70
        mov    $something,%eax
        movq   %rax,%xmm1
-       mov    %fs:0xffffffffffffff60,%rax
+       cmpq   $something,%fs:0xffffffffffffff60
        movdqa %xmm1,%fs:0xffffffffffffff60
-       test   %rax,%rax
        je     <_ZN3std9panicking20rust_panic_with_hook17h40c40e76fede9cc3E + ofs>
        movq   %xmm0,%rbx
        test   %rbx,%rbx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants