-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredump after enospc error #4877
Comments
We cannot decode this core due to #4673 , please reproduce with a newer version. Also please don't split core dumps, it's annoying. |
Closing since the core is not debuggable, but please run the test again with a new version. |
Using version
The coredump occurred several times after that for the next hour, and didn't occur again afterwards |
Not a regression, moving to 3.2 |
I got coredump also during Enospc in longevity-mv-si.
From journalctl in the time of coredump:
|
The commitlog issue looks like #4700 |
i have this issue reproduced on: scylla version - 666.development-0.20191018.d7c3e48e8c4
backtrace:
decoded:
|
this error was also seen during one of the enospc:
decoded:
|
and this error as well:
decoded:
|
@espindola please look into this issue. |
@fgelcer some of the file names in the decoded backtraces look bogus (like |
forgot to attach the link to download the coredump |
those were backtraces found in the logs, and i decoded them by using:
|
@fgelcer, thanks. Also, full logs would be appreciated |
Looks like allocate_segment_ex doesn't close the file it opens in the lambda function on exception. |
file destructor closes fd. Since no objects should escape alive on exception here, it should be ok. Sidenote, any c++ object not adhering to raii should be chastised and flogged. And author to. |
looks like on open (due to out of space). |
The relevant backtrace is this one:
|
Hmm, I'm guessing that maybe truncate hit the exception (due to the involvement of |
We absolutely close fd:s on destruction. Otherwise we would have file leaks an masse, because, no matter how stringent, we'd miss places where continuations would otherwise make us loose a file object. As for the case above, sure, you should maybe hang on "on_exception" handler to both blocks in the function, and it will deal with the truncation exception. But again, I am somewhat against because this is a very C-like restriction that has no place in a nice c++ universe. |
We don't close on destructor as we have to wait and there is no guarantee that we are on a thread. I am looking at the code Benny pointed out it #4877 (comment) |
Again, if we have potential wait condition (beyond ::close - not file::close()), it should be the responsibility of file to queue this up somewhere in background (see tls socket close for example). |
If allocate or truncate throws, we have to close the file. Fixes #4877 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
Already backported to 3.0+. |
I can still reproduced the crash in latest master Scylla version (or git commit hash): 666.development-0.20200304.325c3e13ebf Test id: c309bf33-cd2f-45f7-9ea5-0bed6e71e008
/CC @roydahan @espindola |
@espindola / @bhalevy ? |
@amoskong the new instance jujst looks superficially the same as the original one in this issue
|
Why close this now? Shouldn't it wait for 5509 to be merged? |
@espindola it was wrongly reopened. This issue was closed on a specific root cause in the commitlog that was fixed in 6160b90. |
This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
Installation details
Scylla version (or git commit hash):3.1.0.rc3-0.20190816.d06bcef3b
Cluster size:4
OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-07f2007dc543eced5
During the nemesis nodetool_enospc, several backtraces occured, which ended with a coredump:
The text was updated successfully, but these errors were encountered: