Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.30 beta] Test suite of the jemalloc-ctl crate is failing #54478

Closed
pietroalbini opened this issue Sep 22, 2018 · 19 comments
Closed

[1.30 beta] Test suite of the jemalloc-ctl crate is failing #54478

pietroalbini opened this issue Sep 22, 2018 · 19 comments
Assignees
Labels
A-allocators Area: Custom and system allocators C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Milestone

Comments

@pietroalbini
Copy link
Member

This is not a spurious failure (the test output is consistent between runs), but we need to investigate why this is happening.

@pietroalbini pietroalbini added I-nominated regression-from-stable-to-beta Performance or correctness regression from stable to beta. C-bug Category: This is a bug. labels Sep 22, 2018
@pietroalbini pietroalbini added this to the 1.30 milestone Sep 22, 2018
@sfackler
Copy link
Member

Those errors seem to imply that global allocator registration may have broken?

@sfackler
Copy link
Member

It seems like it may specifically be a problem with rustdoc tests - a normal library test passes as expected.

@pnkfelix pnkfelix self-assigned this Sep 27, 2018
@pnkfelix
Copy link
Member

pnkfelix commented Oct 8, 2018

Bisection over nightly builds indicates that bug was injected between nightly-2018-08-06-x86_64-unknown-linux-gnu (73c7873) and nightly-2018-08-14-x86_64-unknown-linux-gnu (d5a448b).

In the details block are the bors commits from that range:

% git log --author bors 73c7873..d5a448b --format=oneline

@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

Okay I bisected this and identified the bug as being injected by #52993

@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

Adding @alexcrichton to the assignee list in case they have some special insight into how best to address it. (But, as I noted on #52993, I am hoping we can get away with just undoing one small part of that PR, rather than backing out the whole thing, which seems like it might cause more trouble than it would resolve)

@pnkfelix pnkfelix added A-allocators Area: Custom and system allocators T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 9, 2018
@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

(also, tagging with both T-compiler and T-libs for now, though either team is free to untag the other if they want to take responsibility for resolving this.)

@alexcrichton
Copy link
Member

Thanks for the cc @pnkfelix! The issue here can be reduced to the following:

use std::alloc::*;

#[global_allocator]
static ALLOC: A = A;

static mut HIT: bool = false;

struct A;

unsafe impl GlobalAlloc for A {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        HIT = true;
        System.alloc(layout)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        System.dealloc(ptr, layout);
    }
}

fn main() {
    assert!(unsafe { HIT });
}

compiled and run as...

$ rustc +stable -C prefer-dynamic x.rs && rustup run stable ./x
$ rustc +beta -C prefer-dynamic x.rs && rustup run beta ./x
thread 'main' panicked at 'assertion failed: unsafe { HIT }', x.rs:21:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

So specifically the bug here comes with the -C prefer-dynamic flag which, by default, rustdoc passes when compiling tests (which I'm surprised has lasted this long!). This is actually a bug that we accepted this code before! There are two active allocators here, one in the dynamic standard library and also one in the test that we're compiling. I... thought that rustc would reject this but apparently not!

The test actually has always failed on Windows (where the symbols don't trump one another). The regression here is Linux (and probably OSX?) specific. I think the "real fix" here is to not compile tests dynamically but compile them the same way as basically 99% of the compiles in Rust, statically.

@alexcrichton
Copy link
Member

Er sorry, I misspoke! Turns out I saw the regression on Linux and not on Windows because during process startup we allocate memory but don't allocate memory on Windows (wow!). If you amend the program to do Box::new(3); just before the assert then it fails on Windows as well.

Digging more into this it's still, however, a bug that the compiler ever accepted this code. For example consider:

use std::alloc::*;
use std::process::Command;
use std::env;

#[global_allocator]
static ALLOC: A = A;

static mut HITS: usize = 0;

struct A;

unsafe impl GlobalAlloc for A {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        HITS += 1;
        System.alloc(layout)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        System.dealloc(ptr, layout);
    }
}

fn main() {
    unsafe {
        let start = HITS;
        Box::new(2); // our own crate makes an allocation
        assert_eq!(HITS, start + 1);
        drop(env::var("X")); // force libstd itself to make an allocation
        assert!(HITS > start + 1);
    }
}

This test will pass with and without -C prefer-dynamic on stable Rust on Linux, but it fails on Windows with -C prefer-dynamic.

What's happening here is that on stable Rust the executable, with -C prefer-dynamic, contains an exported __rust_alloc symbol. The dynamic standard library, however, also has a __rust_alloc symbol. We strive in rustc to make this never happen, we don't want these sort of symbol conflicts due to their platform-specific behavior. As seen here on Linux the executable effectively shadows the standard library, meaning there's actually two jemalloc allocators here (one in the executable and one in the standard library) and it just magically avoids the standard library's jemalloc allocator.

On Windows, however, symbols can't shadow one another so you've actually got two allocators here. The executable has its own allocator as does the standard library, which can very quickly lead to segfaults as memory is passed between them (if they're not both wired up to System like I have above).

All in all tl;dr; the compiler should have rejected this test previously due to rustdoc compiling tests dynamically. We should enact two changes from this discovery:

  • The compiler should be fixed to actually reject this situation where there's two allocators linked into one program.
  • We should update rustdoc to compile tests statically, not dynamically.

The first one is... "just sort of somewhere in the compiler add a check" and the second one is right here.

@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

@alexcrichton hmm...

I'm curious whether either of the changes you describe should be targetted toward the beta channel (because this is a beta regression).

I suspect the change to the compiler to detect two allocators linked into one program is not a high priority item for beta.

The change to rustdoc ... seems like it should go into the beta (in order to address regressions like the one flagged here on the jemalloc-ctl crate) ... but I worry that its such a severe change that it is more risky to put into beta than to just let the regressions happen to jemalloc-ctl and other crates doing similar things in their rustdoc....

I suppose the best thing would be to do a crater run dedicated to just the rustdoc change and see what comes of that.

@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

Also, talk about an ancient bit of code:

let sessopts = @session::options {
binary: @"rustdoctest",
maybe_sysroot: Some(@os::self_exe_path().unwrap().dir_path()),
addl_lib_search_paths: @mut libs,
outputs: ~[session::OutputExecutable],
debugging_opts: session::prefer_dynamic,
.. (*session::basic_options()).clone()
};

@pietroalbini
Copy link
Member Author

I suppose the best thing would be to do a crater run dedicated to just the rustdoc change and see what comes of that.

We need to be fast to do that though, full Crater run now take up to 5 days.

@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

@pietroalbini let me see if I can get a PR up

@pnkfelix
Copy link
Member

pnkfelix commented Oct 9, 2018

Filed as PR #54939.

I Put it up now because I don't know if my local test run will finish before I hit the sack.

Also, that PR does not yet have a unit test for the regression.

But it should be enough to throw into crater, I hope

bors added a commit that referenced this issue Oct 9, 2018
…c-tests, r=<try>

[WIP] rustdoc: don't prefer dynamic linking in doc tests

This is an attempt to address the regression in #54478

This may be a case where the cure is worse than the disease, at least in the short term...

cc @alexcrichton
@alexcrichton
Copy link
Member

@pnkfelix right yeah changing the compiler to have new errors shouldn't be backported, and I would also be very wary of backporting changes to rustdoc --test, which have historically been notoriously brittle. A crater run looks to be a good start!

(sorry I was waiting for a local build to see the viability of a PR but kept forgetting to go back to it)

@pnkfelix pnkfelix added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Oct 16, 2018
@pnkfelix pnkfelix removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 16, 2018
bors added a commit that referenced this issue Oct 17, 2018
…c-tests, r=QuietMisdreavus

rustdoc: don't prefer dynamic linking in doc tests

This is an attempt to address the regression in #54478

This may be a case where the cure is worse than the disease, at least in the short term...

cc @alexcrichton
@pietroalbini
Copy link
Member Author

Fixed and backported, should work in the next beta release.

@pnkfelix
Copy link
Member

I hate to ask, but: @pietroalbini , I think we still need a regression test, right?

@pietroalbini
Copy link
Member Author

Ugh, there was no regression test in that PR.

@pietroalbini pietroalbini reopened this Oct 19, 2018
@pnkfelix pnkfelix added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Oct 19, 2018
@pnkfelix
Copy link
Member

(But the lack of a regression test does not need to block the release, of course)

@pietroalbini
Copy link
Member Author

Manually checked the issue is actually fixed, just to be sure.

pnkfelix added a commit to pnkfelix/rust that referenced this issue Oct 24, 2018
I confirmed that it fails on:
rustdoc 1.30.0-beta.12 (96a2298 2018-10-04)
and passes on:
rustdoc 1.31.0-nightly (f99911a 2018-10-23)
kennytm added a commit to kennytm/rust that referenced this issue Oct 25, 2018
…t-jemalloc-ctl, r=nikomatsakis

Regression test for rust-lang#54478.

This is a regression test for rust-lang#54478.

I confirmed that it fails on:
rustdoc 1.30.0-beta.12 (96a2298 2018-10-04)
and passes on:
rustdoc 1.31.0-nightly (f99911a 2018-10-23)

Fix rust-lang#54478
pietroalbini added a commit to pietroalbini/rust that referenced this issue Oct 25, 2018
…t-jemalloc-ctl, r=nikomatsakis

Regression test for rust-lang#54478.

This is a regression test for rust-lang#54478.

I confirmed that it fails on:
rustdoc 1.30.0-beta.12 (96a2298 2018-10-04)
and passes on:
rustdoc 1.31.0-nightly (f99911a 2018-10-23)

Fix rust-lang#54478
bors added a commit that referenced this issue Oct 25, 2018
Rollup of 22 pull requests

Successful merges:

 - #53507 (Add doc for impl From for Waker)
 - #53931 (Gradually expanding libstd's keyword documentation)
 - #54965 (update tcp stream documentation)
 - #54977 (Accept `Option<Box<$t:ty>>` in macro argument)
 - #55138 (in which unused-parens suggestions heed what the user actually wrote)
 - #55173 (Suggest appropriate syntax on missing lifetime specifier in return type)
 - #55200 (Documents `From` implementations for `Stdio`)
 - #55245 (submodules: update clippy from 5afdf8b to b1d0343)
 - #55247 (Clarified code example in char primitive doc)
 - #55251 (Fix a typo in the documentation of RangeInclusive)
 - #55253 (only issue "variant of the expected type" suggestion for enums)
 - #55254 (Correct trailing ellipsis in name_from_pat)
 - #55269 (fix typos in various places)
 - #55282 (Remove redundant clone)
 - #55285 (Do some copy editing on the release notes)
 - #55291 (Update stdsimd submodule)
 - #55296 (Set RUST_BACKTRACE=0 for rustdoc-ui/failed-doctest-output.rs)
 - #55306 (Regression test for #54478.)
 - #55328 (Fix doc for new copysign functions)
 - #55340 (Operands no longer appear in places)
 - #55345 (Remove is_null)
 - #55348 (Update RELEASES.md after destabilization of non_modrs_mods)

Failed merges:

r? @ghost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants