-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scylla generated coredump when fail to allocate_segment #2924
Comments
@frank8989 It looks like you are running with
If you didn't have that flag enabled, what would happen is the exception would propagate up to the allocating section code, which would unlock the region, then increase the amount of reserves, then reclaim, and retry the operation. It's likely that the operation would then succeed. The problem here is that |
@tgrabiec , From frame 2: From frame 7: It's the same region_impl. In allocating_section, scylla has catch std::bad_alloc and call refill_emergency_reserve without reclaim_lock to compact and evict some segments. So, we should not call abort() in allocate_segment, just return nullptr. Thanks, |
allocate_segment either has to succeed or throw an exception. It cannot return a nullptr. The allocating section should temporarily disable the abort logic around the nested operation, and abort when the whole section gives up. |
Got it. Thanks. Thanks, |
@tgrabiec - is this fixed post 1.7.5 ? |
No. But note that this is only a problem if you have that abort on alloc failure flag enabled. |
|
allocate_segment() can fail even though we're not out of memory, when it's invoked inside an allocating section with the cache region locked. That section may later succeed after retried after memory reclamation. We should ignore bad_alloc thrown inside allocating section body and fail only when the whole section fails. Fixes #2924 Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit dafe22d)
allocate_segment() can fail even though we're not out of memory, when it's invoked inside an allocating section with the cache region locked. That section may later succeed after retried after memory reclamation. We should ignore bad_alloc thrown inside allocating section body and fail only when the whole section fails. Fixes scylladb#2924 Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit dafe22d)
allocate_segment() can fail even though we're not out of memory, when it's invoked inside an allocating section with the cache region locked. That section may later succeed after retried after memory reclamation. We should ignore bad_alloc thrown inside allocating section body and fail only when the whole section fails. Fixes #2924 Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit dafe22d)
Installation details
Scylla version (or git commit hash): 1.7.5
Cluster size: 3 nodes
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS Linux release 7.2.1511 (Core)
Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores=56 hyperthreading= memory= 256G
Disks: (SSD/HDD, count) 3T SSD
I suspect that compaction_lock set region_impl::_reclaiming_enabled false and didn't reset it to true, thus compact_and_evict couldn't reclaim and compact that region.
The core dump file is on the intranet, it's not easy to upload here.
Thanks,
Frank
The text was updated successfully, but these errors were encountered: