New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredump when using latest nightly rustc #37105
Comments
Thanks for the report @BusyJay! Is there also a set of steps to reproduce the crash you're seeing as well locally? Also, is this on Linux? |
Yes, it's on Linux. It needs a few steps to reproduce the crash, we will provide it via a demo project later. |
The steps to reproduce the crash may be not easy: Download official binaries
You can use Clone TiKV and build
The Run PD
Run TiKV
Run TiDB
Run TestUnzip bank test bank.zip and run
Then wait a long time...... If you have any problem, please tell me. |
(Shot in the dark) Does this reproduce on |
@siddontang Is there any way this crash can be reduced to a smaller test case? It's going to be quite tricky to track down otherwise. Do you have instructions for reproducing the whole thing from source, without a binary download? If parts of it are not open source we may be able to arrange to debug it privately, but in any case a smaller test case would help immensely. |
@eddyb: Doesn't that nightly already default to orbit? As far as I can tell the switch happened between |
@TimNN Ah you are indeed correct. Well, that's one down, 99 potential causes to go. |
Everyone involved in this thread: if you haven't already, check out http://rr-project.org/. |
Hi @brson We tried to reproduce this coredump with a simple test, but failed. 😭 PD and TiDB are both written with Go, and are all open source under Apache-2, so you can use the If you want to build yourself, you must install go 1.6+ first (https://golang.org/doc/install), then: git clone https://github.com/pingcap/tidb.git $GOPATH/src/github.com/pingcap/tidb
cd $GOPATH/src/github.com/pingcap/tidb
make
# the tidb-server is installed in $GOPATH/src/github.com/pingcap/tidb/bin directory
git clone https://github.com/pingcap/pd.git $GOPATH/src/github.com/pingcap/pd
cd $GOPATH/src/github.com/pingcap/pd
make
# the pd-server is installed in $GOPATH/src/github.com/pingcap/pd/bin directory |
@siddontang what is the source to the bank program? The zip file only has an executable. |
@siddontang I cannot reproduce the scenario; when I try to run the https://gist.github.com/pnkfelix/4c87f20badee2c5110c23005984830cd and the
(The other two services keep running...) I second @eddyb 's suggestion of trying to use |
Hi @pnkfelix In the bank case, we will create at lease Of course, you can use another concurrency in bank like |
Thanks for the new info @siddontang. @pnkfelix maybe you can try again? |
Hi @pnkfelix, can you reproduce it? |
@rust-lang/compiler this needs a P- tag. |
triage: P-high In particular, we should figure out if we can reproduce this or not! |
Just tested with rr, but got an unexpected error. I will retry it later once it's resolved. |
rr issue is resolved. |
Yes, and I have been testing it for more than 12 hours, and it still not crash yet. I guess rr emulates a single-core machine just makes it very slow. |
It's looking like we're unlikely to solve this before release. |
In @rust-lang/compiler meeting, discussed. We're basically still having trouble reproducing the bug (ideally in rr). No real status update. |
@pnkfelix maybe we can reduce this to P-medium until it's clear there's a bug to tackle on the rust side here. |
triage: P-medium Seeing as how we have not been able to reproduce, and we've basically stalled out, we're going to downgrade this in priority. @BusyJay, please let us know current status (is this still reproducing for you outside rr?) and if there is anything we can do to help track it down. |
Is this now stable-to-stable? |
Sorry that we don't have enough time to reproduce it. We tested it three weeks ago and this issue still existed. We can test it again after you release the newest nightly version :-) |
I'm sorry that we can't reproduce it either. =( Please do give it another try so at least we know if it is still a problem! |
Have we at least narrowed this down to a specific nightly where the problem seems to occur? It seems like it is not due to the switch to MIR, right? |
Could this be some sort of data race? Being unable to reduce it under rr sounds like a race condition. Can you try ThreadSanitizer? |
We do some test and can't reproduce it with the newest rust + newest TiKV, but to our surprise, we can reproduce it with the newest rust + old TiKV (2016-08-21 version). Our rust version is:
We don't know why now, maybe the changes in TiKV skip the trigger condition for the core dump, or the problem still exists but we don't meet it sadly. Now we decide to use the newest rust for TiKV, and if we meet this core dump later, we will update the issue. |
@siddontang ok, well, I'm glad you're not hitting the issue anymore, but I wish we had a better handle on what the problem is exactly. Of course, it is also possible that the bug is in fact in TikV (or some other package featuring unsafe code), so it's quite likely that the problem is indeed fixed by a newer version. |
I'm going to downgrade to P-low until we have more data. |
triage: P-low |
Sadly, we meet the coredump again with the newest rustc + newest TiKV. 😭 We will try to reproduce it again.
|
@siddontang argh, sorry to hear that. :( |
A strange update, after we merge tikv/tikv#1512, we find that using newest rust is ok, we run many tests for a long time and the core dump doesn't happen, so we guess this PR fixes the problem, but we don't know why, could you help us to find the reason? We used nightly-2016-08-06 before, so I think the bug is introduced after this version. |
@thanks for the continued investigations and update @siddontang . |
unassigning self. I'm not sure we can reasonably expect to determine the underlying problem that has either been fixed or masked, since as far as I can tell, no one working on the rustc compiler has locally reproduced the problem. |
It’s been over a year since any update, and almost two years since this issue was originally reported. I’m going to go ahead and close this, @BusyJay, if you have a way to reproduce and still care about this, please let me know! |
Hi, recently we upgrade our rustc compiler to the latest nightly version, but the compiled binary core dumps quickly under stress tests. A few stacks can be found in tikv/tikv#1144. But when we downgrade rustc to
rustc 1.12.0-nightly (b30eff7ba 2016-08-05)
, the binary works just fine.The stacks look weird, because the segment fault happens in
liballoc
, but we don't manage memory by ourselves. We are guessing that there might be some problems in the versions later thanrustc 1.12.0-nightly (b30eff7ba 2016-08-05)
. Could you please help us check it out? Thanks!The text was updated successfully, but these errors were encountered: