-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression: use-after-free sanitizer error in dtest #1072
Comments
Is thi on head or the side branch of Glauber's version On Mon, Mar 21, 2016 at 3:13 PM, nyh notifications@github.com wrote:
|
On Mon, Mar 21, 2016 at 4:28 PM, slivne notifications@github.com wrote:
Glauber's version. |
I just checked, and I get the same sanitizer failure with "--smp 1". It's not specific to smp 2. |
we found its a regression via jenkins build http://jenkins.cloudius-systems.com:8080/job/urchin-dtest/label=monster,mode=debug,smp=1/795/ has this test pass build http://jenkins.cloudius-systems.com:8080/job/urchin-dtest/label=monster,mode=debug,smp=1/796/ has this tests fail its not related to glaubers branch |
I'm bisecting this now. As usual, the compilation is excruciatingly slow (I only set up one compilation machine - I need to set up the second one to get a little more speed....) |
@slivne @glommer the result of the bisection is:
So somehow, while this patch fixed one problem, it created another one - apparently starting a call for lower_bound() and not keeping the sstable alive before it completes. I'll investigate. |
On Mon, Mar 21, 2016 at 11:57 AM, nyh notifications@github.com wrote:
Thanks Nadav.
|
In one of the runs with smp==1, I was lucky to get much more information about the use-after-free bug:
|
Perhaps 'pc' should be captured by reference here (and similarly for end): auto start = [this, range, schema, pc] {
return range.start() ? (range.start()->is_inclusive()
? lower_bound(schema, range.start()->value(), pc)
: upper_bound(schema, range.start()->value(), pc))
: make_ready_future<uint64_t>(0);
};
|
Exactly :-) I found the same thing.... |
Commit 6a3872b fixed some use-after-free bugs but introduced a new one because of a typo: Instead of capturing a reference to the long-living io-class object, as all the code does, one place in the code accidentally captured a *copy* of this object. This copy had a very temporary life, and when a reference to that *copy* was passed to sstable reading code which assumed that it lives at least as long as the read call, a use-after-free resulted. Fixes #1072 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1458595629-9314-1-git-send-email-nyh@scylladb.com> (cherry picked from commit 2eb0627)
Commit 6a3872b fixed some use-after-free bugs but introduced a new one because of a typo: Instead of capturing a reference to the long-living io-class object, as all the code does, one place in the code accidentally captured a *copy* of this object. This copy had a very temporary life, and when a reference to that *copy* was passed to sstable reading code which assumed that it lives at least as long as the read call, a use-after-free resulted. Fixes #1072 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1458595629-9314-1-git-send-email-nyh@scylladb.com> (cherry picked from commit 2eb0627)
Commit 6a3872b fixed some use-after-free bugs but introduced a new one because of a typo: Instead of capturing a reference to the long-living io-class object, as all the code does, one place in the code accidentally captured a *copy* of this object. This copy had a very temporary life, and when a reference to that *copy* was passed to sstable reading code which assumed that it lives at least as long as the read call, a use-after-free resulted. Fixes #1072 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1458595629-9314-1-git-send-email-nyh@scylladb.com> (cherry picked from commit 2eb0627)
I'm running one of the repair dtests with smp==2 and built in debug mode, with the following command
This crashes in the setup phase (when the test is checking some data it wrote), before even repair starts, so the problem is probably not repair-specific. The errors I see in the log are different at different runs, but always at the same time (at the "checking data" phase, before starting any repair - I think during shutdown of a node):
Sometimes I get a lot of info:
Sometimes almost no info:
The text was updated successfully, but these errors were encountered: