Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serde compilation is not deterministic #46846

Closed
johnklai1 opened this issue Dec 19, 2017 · 3 comments · Fixed by #46918
Closed

serde compilation is not deterministic #46846

johnklai1 opened this issue Dec 19, 2017 · 3 comments · Fixed by #46918

Comments

@johnklai1
Copy link

I was looking into why sccache was missing more than expected, and it looks like serde_derive compiles are not deterministic.

I tried this code on OSX 10.12 (Xcode 7.3.1)

> cd serde
> git checkout 0b89bc920e6052c05483c8b56661e859caae9d25
> cargo build
> md5 target/debug/deps/libserde_derive-*.dylib
(output: MD5 (target/debug/deps/libserde_derive-c403614b534b4e94.dylib) = be85fd8981e5bb5aa789a56d6f9ae4da)
> rm target/debug/deps/libserde_derive-*.dylib
> cargo build
> md5 target/debug/deps/libserde_derive-*.dylib
(output: MD5 (target/debug/deps/libserde_derive-c403614b534b4e94.dylib) = 5fc806ed2f22bdf7e9c3b0b018ca2671)

I expected that the MD5s would be identical after deleting and re-running cargo build. I ran with just the underlying rustc command, and that also generated different files each time. Here is the underlying rustc command:

rustc --crate-name serde_derive serde_derive/src/lib.rs --crate-type proc-macro --emit=dep-info,link -C prefer-dynamic -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="deserialize_from"' -C metadata=c403614b534b4e94 -C extra-filename=-c403614b534b4e94 --out-dir /Users/jklai/serde/target/debug/deps -L dependency=/Users/jklai/serde/target/debug/deps --extern syn=/Users/jklai/serde/target/debug/deps/libsyn-b667971f1cdadd98.rlib --extern quote=/Users/jklai/serde/target/debug/deps/libquote-34dfebedcb580111.rlib --extern serde_derive_internals=/Users/jklai/serde/target/debug/deps/libserde_derive_internals-3788ef8c57ed9e45.rlib

I have also tried this on Windows and Linux, and it also generates different serde_derive libraries when rebuilding.

Meta

rustc --version --verbose: rustc 1.24.0-nightly (dc39c31 2017-12-17)

@alexcrichton
Copy link
Member

Looking into this a bit. I had to run the build a few times but eventually was able to reproduce as well (on stable, but nightly also seems affected)

I found that the synom dependency was the one changing (that rlib was the only thing changing in the input to the linker). So I checked out dtolnay/syn@5e42499 and built just the synom crate. Turns out two objects files were changing:

  • synom-3d068837ed89ca46.synom15.rust-cgu.o
  • synom-3d068837ed89ca46.synom9.rust-cgu.o

Emitting the LLVM IR for all the stages yields two diffs:

Looks like those objects files may have been swapped!

cc @michaelwoerister, is this perhaps related to the partitioning code? Can you think of anything that may cause two codegen units to be swapped between two compilations?

I also have a sinking feeling this may have been introduced with the thinlto work...

@michaelwoerister
Copy link
Member

Maybe this code can introduce instabilities if there are codegen units with the same number of trans items in them?:

fn merge_codegen_units<'tcx>(initial_partitioning: &mut PreInliningPartitioning<'tcx>,
target_cgu_count: usize,
crate_name: &str) {
assert!(target_cgu_count >= 1);
let codegen_units = &mut initial_partitioning.codegen_units;
// Merge the two smallest codegen units until the target size is reached.
// Note that "size" is estimated here rather inaccurately as the number of
// translation items in a given unit. This could be improved on.
while codegen_units.len() > target_cgu_count {
// Sort small cgus to the back
codegen_units.sort_by_key(|cgu| -(cgu.items().len() as i64));
let mut smallest = codegen_units.pop().unwrap();
let second_smallest = codegen_units.last_mut().unwrap();
for (k, v) in smallest.items_mut().drain() {
second_smallest.items_mut().insert(k, v);
}
}
for (index, cgu) in codegen_units.iter_mut().enumerate() {
cgu.set_name(numbered_codegen_unit_name(crate_name, index));
}
}

@alexcrichton
Copy link
Member

A minimal reproduction of this locally for me is:

pub fn foo(bar: &str) {
    bar.find('\n');
}

When compiled with rustc foo.rs --crate-type lib it'll oscillate a bit. There's definitely two codegen units getting swapped. Gonna dig more to see what codegen units are oscillating.

@alexcrichton alexcrichton reopened this Dec 21, 2017
alexcrichton added a commit to alexcrichton/rust that referenced this issue Dec 21, 2017
This commit fixes some nondeterminism in compilation when using multiple codegen
units. The algorithm for splitting codegen units currently takes the
otherwise-would-be-for-incremental partitioning and then continuously merges the
two smallest codegen units until the desired number of codegen units are
reached.

We want to be sure to merge the same codegen units each time a compilation is
run but there's some subtle reorderings amongst all the items which was causing
this step to be slightly buggy. Notably this step involves sorting codegen units
by size, but if two codegen units had the same size they would appear in
different locations in the list each time.

This commit fixes this issue by sorting codegen units by name before doing the
loop to merge the two smallest. This means that we've got a deterministic
order going in and since we're using a stable sort this should mean that we're
always now getting a deterministic merging of codegen units.

Closes rust-lang#46846
kennytm added a commit to kennytm/rust that referenced this issue Dec 21, 2017
…lwoerister

rustc: Sort CGUs before merging

This commit fixes some nondeterminism in compilation when using multiple codegen
units. The algorithm for splitting codegen units currently takes the
otherwise-would-be-for-incremental partitioning and then continuously merges the
two smallest codegen units until the desired number of codegen units are
reached.

We want to be sure to merge the same codegen units each time a compilation is
run but there's some subtle reorderings amongst all the items which was causing
this step to be slightly buggy. Notably this step involves sorting codegen units
by size, but if two codegen units had the same size they would appear in
different locations in the list each time.

This commit fixes this issue by sorting codegen units by name before doing the
loop to merge the two smallest. This means that we've got a deterministic
order going in and since we're using a stable sort this should mean that we're
always now getting a deterministic merging of codegen units.

Closes rust-lang#46846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants