Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla generated coredump when fail to allocate_segment #2924

Closed
frank8989 opened this issue Oct 26, 2017 · 7 comments

Comments

@frank8989
Copy link

commented Oct 26, 2017

Installation details
Scylla version (or git commit hash): 1.7.5
Cluster size: 3 nodes
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS Linux release 7.2.1511 (Core)

Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores=56 hyperthreading= memory= 256G
Disks: (SSD/HDD, count) 3T SSD

backtrace
(gdb) bt
#0  0x00007f70364715f7 in raise () from /lib64/libc.so.6
#1  0x00007f7036472e28 in abort () from /lib64/libc.so.6
#2  0x0000000000a602d6 in logalloc::segment_pool::allocate_segment (this=this@entry=0x7f702b3f9520) at utils/logalloc.cc:693
#3  0x0000000000a603a8 in logalloc::segment_pool::allocate_or_fallback_to_reserve (this=0x7f702b3f9520) at utils/logalloc.cc:752
#4  0x0000000000a60431 in logalloc::segment_pool::new_segment (this=this@entry=0x7f702b3f9520, r=r@entry=0x613000037430) at utils/logalloc.cc:773
#5  0x0000000000a66576 in logalloc::region_impl::new_segment (this=0x613000037430) at utils/logalloc.cc:1215
#6  logalloc::region_impl::close_and_open (this=0x613000037430) at utils/logalloc.cc:1235
#7  logalloc::region_impl::alloc_small (this=0x613000037430, migrator=0x274d220 <standard_migrator<rows_entry>::object>, size=184, alignment=8) at utils/logalloc.cc:1151
#8  0x0000000000a66c65 in logalloc::region_impl::alloc (this=0x613000037430, migrator=<optimized out>, size=<optimized out>, alignment=<optimized out>) at utils/logalloc.cc:1351
#9  0x0000000000a7173f in allocation_strategy::construct<rows_entry, clustering_key_prefix const&> (this=0x613000037430) at utils/allocation_strategy.hh:114
#10 apply_reversibly_intrusive_set (s=..., dst=..., src=...) at mutation_partition.cc:216
#11 0x0000000000a723b9 in mutation_partition::apply(schema const&, mutation_partition&&) (this=0x61304bbdb410, s=..., p=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x4e1d2a2, DIE 0x50107e9>) at mutation_partition.cc:350
#12 0x0000000000a7259d in mutation_partition::apply(schema const&, mutation_partition&&, schema const&) (this=<optimized out>, s=..., p=p@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x4e1d2a2, DIE 0x5010a98>,
    p_schema=...) at mutation_partition.cc:338
#13 0x0000000000a0f156 in partition_entry::apply (this=0x61309c819f78, s=..., pv=0x61304f0e5680, pv_schema=...) at partition_version.cc:145
#14 0x0000000000a0fea1 in partition_entry::apply(schema const&, partition_entry&&, schema const&) (this=this@entry=0x61309c819f78, s=..., pe=pe@entry=<unknown type in /usr/lib/debug/usr/bin/scylla.debug, CU 0x405a103, DIE 0x4147d19>,
    mp_schema=...) at partition_version.cc:216
#15 0x0000000000a1907a in row_cache::<lambda()>::<lambda()>::<lambda()>::<lambda()>::operator() (__closure=<optimized out>) at row_cache.cc:876
#16 with_linearized_managed_bytes<row_cache::update(memtable&, partition_presence_checker)::<lambda()>::<lambda()>::<lambda()>::<lambda()> > (func=<optimized out>) at utils/managed_bytes.hh:397
#17 row_cache::<lambda()>::<lambda()>::<lambda()>::operator() (__closure=<optimized out>) at row_cache.cc:893
#18 logalloc::allocating_section::operator()<row_cache::update(memtable&, partition_presence_checker)::<lambda()>::<lambda()>::<lambda()> > (func=<optimized out>, r=..., this=0x613000319c30) at utils/logalloc.hh:654
#19 row_cache::<lambda()>::<lambda()>::operator() (__closure=<optimized out>) at row_cache.cc:897
#20 with_allocator<row_cache::update(memtable&, partition_presence_checker)::<lambda()>::<lambda()> > (func=<optimized out>, alloc=...) at utils/allocation_strategy.hh:235
#21 row_cache::<lambda()>::operator()(void) const (__closure=<optimized out>) at row_cache.cc:902
#22 0x0000000000470d5e in std::function<void ()>::operator()() const (this=0x61310e90d610) at /opt/scylladb/include/c++/5.3.1/functional:2271
#23 seastar::thread_context::main (this=0x61310e90d600) at core/thread.cc:292
#24 0x00000000005b6ca2 in seastar::thread_context::s_main (lo=<optimized out>, hi=<optimized out>) at core/thread.cc:282
#25 0x00007f7036483110 in ?? () from /lib64/libc.so.6
#26 0x0000000000000000 in ?? ()

(gdb) scylla memory
Used memory:    4528455680
Free memory:      32849920
Total memory:   4561305600

Small pools:
objsz spansz    usedobj       memory  wst%
    1   4096          0            0   0.0
    1   4096          0            0   0.0
    1   4096          0            0   0.0
    1   4096          0            0   0.0
    2   4096          0            0   0.0
    2   4096          0            0   0.0
    3   4096          0            0   0.0
    3   4096          0            0   0.0
    4   4096          0            0   0.0
    5   4096          0            0   0.0
    6   4096          0            0   0.0
    7   4096          0            0   0.0
    8   4096       2608        28672  27.2
   10   4096          0         8192  99.9
   12   4096        171         8192  74.9
   14   4096          0         8192  99.8
   16   4096       2135        40960  16.6
   20   4096       2499        57344  12.5
   24   4096      16724       405504   0.6
   28   4096       3285        98304   6.2
   32   4096       4914       163840   4.0
   40   4096     793818     31883264   0.0
   48   4096       1578        81920   7.1
   56   4096      10285       581632   0.8
   64   4096       5612       364544   1.5
   80   4096       2672       217088   1.1
   96   4096       9064       888832   0.5
  112   4096       1029       122880   4.6
  128   4096        375        57344  16.3
  160   8192       8054      1302528   0.7
  192   8192        635       139264  10.9
  224   8192        786       196608   8.9
  256   8192        488       147456  15.3
  320  16384        485       180224  13.5
  384  16384         92        65536  44.5
  448  16384        415       196608   3.9
  512  16384     670486    343293952   0.0
  640  32768        283       229376  20.6
  768  32768        638       557056  10.5
  896  32768        173       229376  30.9
 1024  32768        467       557056  14.2
 1280  65536        538       786432  12.0
 1536  65536        861      1441792   6.7
 1792  65536         96       327680  45.9
 2048  65536        113       393216  41.1
 2560 131072         23       262144  77.1
 3072 131072         94       524288  43.4
 3584 131072        111       655360  37.7
 4096 131072         14       393216  85.4
 5120 262144        140      1048576  31.2
 6144 262144         36       524288  56.2
 7168 262144         75      1048576  47.2
 8192 262144        128      1572864  33.3
10240 524288         91      1572864  40.4
12288 524288          5      1048576  92.6
14336 524288         40      1572864  62.0
---Type <return> to continue, or q <return> to quit---
16384 524288         53      2097152  58.6
Page spans:
index      size [B] free [B]
    0          4096 163840
    1          8192 2772992
    2         16384 634880
    3         32768 1343488
    4         65536 98304
    5        131072 1130496
    6        262144 3481600
    7        524288 868352
    8       1048576 0
    9       2097152 3080192
   10       4194304 0
   11       8388608 0
   12      16777216 19275776
   13      33554432 0
   14      67108864 0
   15     134217728 0
   16     268435456 0
   17     536870912 0
   18    1073741824 0
   19    2147483648 0
   20    4294967296 0
   21    8589934592 0
   22   17179869184 0
   23   34359738368 0
   24   68719476736 0
   25  137438953472 0
   26  274877906944 0
   27  549755813888 0
   28 1099511627776 0
   29 2199023255552 0
   30 4398046511104 0
   31 8796093022208 0
(gdb) scylla lsa
Log Structured Allocator

LSA memory in use:       4059299840
Non-LSA memory in use:            0
Total memory in use:     4059299840

Emergency reserve goal:           1
Emergency reserve max:           30
Emergency reserve current:        1

LSA regions:
    Region #2448
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #7970
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #976
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2566
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1201
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #865
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #632
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1651
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2677
      - reclaimable:              1
      - evictable:                0
---Type <return> to continue, or q <return> to quit---
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1863
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #523
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #3788
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #222
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2599
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #10773
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2009
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1330
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #10778
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2676
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
---Type <return> to continue, or q <return> to quit---
      - unused memory:            0
    Region #415
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2446
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #10474
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2576
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #872
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2094
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1331
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1543
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1438
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2745
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #864
---Type <return> to continue, or q <return> to quit---
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2709
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2626
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2598
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #304
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2386
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #10777
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #977
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1090
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #743
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #305
      - reclaimable:              1
      - evictable:                0
---Type <return> to continue, or q <return> to quit---
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #56
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #221
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1091
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2464
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #873
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2482
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2466
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #5413
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #522
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2567
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
---Type <return> to continue, or q <return> to quit---
      - unused memory:            0
    Region #1650
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2629
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1439
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #414
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #8546
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #5419
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #633
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #742
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #10401
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory: 83623936
      - unused memory:      9693899
    Region #1200
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2008
---Type <return> to continue, or q <return> to quit---
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2616
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #58
      - reclaimable:              0
      - evictable:                1
      - non-LSA memory:           0
      - closed LSA memory: 3969908736
      - unused memory:    320757853
    Region #2617
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1542
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2483
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2577
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #1317
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #57
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0


    Region #10772
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:  4980736
      - unused memory:       591311
    Region #2708
      - reclaimable:              1
      - evictable:                0
---Type <return> to continue, or q <return> to quit---
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2744
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #10775
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0
    Region #2095
      - reclaimable:              1
      - evictable:                0
      - non-LSA memory:           0
      - closed LSA memory:        0
      - unused memory:            0


size_t tracker::impl::
_reclaiming_enabled = true;

I suspect that compaction_lock set region_impl::_reclaiming_enabled false and didn't reset it to true, thus compact_and_evict couldn't reclaim and compact that region.

The core dump file is on the intranet, it's not easy to upload here.

Thanks,
Frank

@tzach tzach added the bug label Oct 26, 2017

@tgrabiec

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2017

@frank8989 It looks like you are running with --abort-on-lsa-bad-alloc enabled, which causes abort in case LSA cannot allocate a segment. In your case there is a lot of memory in a cache region, but it's locked:

Region #58
- reclaimable: 0
- evictable: 1
- non-LSA memory: 0
- closed LSA memory: 3969908736
- unused memory: 320757853

If you didn't have that flag enabled, what would happen is the exception would propagate up to the allocating section code, which would unlock the region, then increase the amount of reserves, then reclaim, and retry the operation. It's likely that the operation would then succeed. The problem here is that --abort-on-lsa-bad-alloc is considering a non-fatal bad_alloc as fatal.

@frank8989

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@tgrabiec
You're right. I enable --abort-on-lsa-bad-alloc flag. From the backtrace:
(gdb) frame 18
#18 logalloc::allocating_section::operator()<row_cache::update(memtable&, partition_presence_checker)::<lambda()>::<lambda()>::<lambda()> > (func=, r=..., this=0x613000319c30) at utils/logalloc.hh:654
654 return func();
(gdb) p (region_impl*)0x613000037430
$1 = (logalloc::region_impl ) 0x613000037430
(gdb) p (region_impl)0x613000037430
$2 = {<allocation_strategy> = {_vptr.allocation_strategy = 0x1be0fc8 <vtable for logalloc::region_impl+16>, _preferred_max_contiguous_allocation = 26214}, _region = 0x7f702b3f91d8, _group = 0x0, _active = 0x613003800000,
_active_offset = 262009,
_segments = {<boost::heap::detail::make_binomial_heap_base<logalloc::segment
, boost::parameter::aux::arg_list<boost::heap::constant_time_size, boost::parameter::aux::arg_list<boost::heap::allocator<logalloc::prepared_buffers_allocatorlogalloc::segment* >, boost::parameter::aux::arg_list<boost::heap::comparelogalloc::segment_occupancy_descending_less_compare, boost::parameter::aux::empty_arg_list> > > >::type> = {<boost::heap::detail::heap_base<logalloc::segment*, logalloc::segment_occupancy_descending_less_compare, false, unsigned long, false>> = {logalloc::segment_occupancy_descending_less_compare = {}, <boost::heap::detail::size_holder<false, unsigned long>> = {
static constant_time_size = }, static is_stable = }, <logalloc::prepared_buffers_allocator<boost::heap::detail::parent_pointing_heap_nodelogalloc::segment* >> = {
static prepared_buffer = 0x0}, }, static constant_time_size = , static has_ordered_iterators = , static is_mergable = , static is_stable = ,
static has_reserve = , top_element = 0x613000533188,
trees = {<boost::intrusive::list_impl<boost::intrusive::bhtraits<boost::heap::detail::heap_node_base, boost::intrusive::list_node_traits<void*>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1u>, unsigned long, true, void>> = {static constant_time_size = true, static stateful_value_traits = , static has_container_from_iterator = , static safemode_or_autounlink = ,
data
= {<boost::intrusive::bhtraits<boost::heap::detail::heap_node_base, boost::intrusive::list_node_traits<void*>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1u>> = {<boost::intrusive::bhtraits_base<boost::heap::detail::heap_node_base, boost::intrusive::list_node<void*>, boost::intrusive::dft_tag, 1u>> = {}, static link_mode = },
root_plus_size_ = {<boost::intrusive::detail::size_holder<true, unsigned long, void>> = {static constant_time_size = , size_ = 8}, m_header = {<boost::intrusive::list_node<void
>> = {next_ = 0x61300043cdd0,
prev_ = 0x61300058d5a0}, }}}}, }}, _closed_occupancy = {_free_space = 320757853, _total_space = 3969908736}, _non_lsa_occupancy = {_free_space = 0, _total_space = 0},
_evictable_space = 0, _reclaiming_enabled = false, _evictable = true, _id = 58, _reclaim_counter = 6322853,
_eviction_fn = {<std::_Maybe_unary_or_binary_functionmemory::reclaiming_result> = {}, std::_Function_base = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7f702b3f9168,
_M_const_object = 0x7f702b3f9168, _M_function_pointer = 0x7f702b3f9168, _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7f702b3f9168, this adjustment 12547326},
_M_pod_data = "h\221?+p\177\000\000\376t\277\000\000\000\000"},
_M_manager = 0xa10900 <std::_Function_base::_Base_manager<cache_tracker::cache_tracker()::<lambda()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>},
_M_invoker = 0xa15060 <std::_Function_handler<memory::reclaiming_result(), cache_tracker::cache_tracker()::<lambda()> >::_M_invoke(const std::_Any_data &)>}, heap_handle = {node = 0x0}}

From frame 2:
(gdb) frame 2
#2 0x0000000000a602d6 in logalloc::segment_pool::allocate_segment (this=this@entry=0x7f702b3f9520) at utils/logalloc.cc:693
693 abort();
(gdb) p tracker_instance
$3 = {_impl = std::unique_ptrlogalloc::tracker::impl containing 0x6130000325c0, _reclaimer = {_reclaim = {<std::_Maybe_unary_or_binary_functionmemory::reclaiming_result> = {}, std::_Function_base = {
static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7f702b3f95e0, _M_const_object = 0x7f702b3f95e0, _M_function_pointer = 0x7f702b3f95e0,
_M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7f702b3f95e0}, _M_pod_data = "\340\225?+p\177\000\000\000\000\000\000\000\000\000"},
_M_manager = 0xa51e70 <std::_Function_base::_Base_manager<logalloc::tracker::tracker()::<lambda()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>},
_M_invoker = 0xa607b0 <std::_Function_handler<memory::reclaiming_result(), logalloc::tracker::tracker()::<lambda()> >::_M_invoke(const std::_Any_data &)>}, _scope = memory::reclaimer_scope::sync}}
(gdb) p tracker_instance._impl
$4 = std::unique_ptrlogalloc::tracker::impl containing 0x6130000325c0
(gdb) p tracker_instance._impl->_regions
$5 = std::vector of length 74, capacity 128 = {0x613000039c10, 0x61310cdbe890, 0x613000035cf0, 0x613000c77270, 0x6130000356d0, 0x6130000361d0, 0x6130000367f0, 0x6130000349b0, 0x613000039190, 0x613000034710, 0x6130000369b0,
0x613000db0350, 0x613000037190, 0x613000038fd0, 0x61310eb4e8d0, 0x613000034470, 0x613000035270, 0x61310c418eb0, 0x613000038e10, 0x613000036c50, 0x6130000396d0, 0x61310cdbd710, 0x613000c770b0, 0x613000036010, 0x6130000340f0,
0x613000035190, 0x613000034c50, 0x613000034fd0, 0x613000039a50, 0x6130000362b0, 0x613000c767f0, 0x613000c77cf0, 0x613000038d30, 0x613000036fd0, 0x613000c77eb0, 0x613000e62ef0, 0x613000035c10, 0x613000035a50, 0x613000036470,
0x613000036ef0, 0x6130000376d0, 0x613000037270, 0x613000035970, 0x613000039eb0, 0x613000035eb0, 0x613000039b30, 0x613000039dd0, 0x61301a804dd0, 0x613000036a90, 0x613000c78390, 0x613000034a90, 0x613000c77350, 0x613000034ef0,
0x613000036d30, 0x61301a8046d0, 0x613000e62a90, 0x613000036710, 0x613000036550, 0x61310dac2270, 0x6130000357b0, 0x613000034550, 0x613000039970, 0x613000037430, 0x613000c76fd0, 0x613000034d30, 0x6130000341d0, 0x6130000395f0,
0x613000035430, 0x613000037510, 0x613000dbeb30, 0x613000c77dd0, 0x613000c760f0, 0x61310d703cf0, 0x6130000342b0}

From frame 7:
(gdb) frame 7
#7 logalloc::region_impl::alloc_small (this=0x613000037430, migrator=0x274d220 <standard_migrator<rows_entry>::object>, size=184, alignment=8) at utils/logalloc.cc:1151
1151 close_and_open();
(gdb) p this
$6 = (logalloc::region_impl * const) 0x613000037430
(gdb) p this
$7 = {<allocation_strategy> = {_vptr.allocation_strategy = 0x1be0fc8 <vtable for logalloc::region_impl+16>, _preferred_max_contiguous_allocation = 26214}, _region = 0x7f702b3f91d8, _group = 0x0, _active = 0x613003800000,
_active_offset = 262009,
_segments = {<boost::heap::detail::make_binomial_heap_base<logalloc::segment
, boost::parameter::aux::arg_list<boost::heap::constant_time_size, boost::parameter::aux::arg_list<boost::heap::allocator<logalloc::prepared_buffers_allocatorlogalloc::segment* >, boost::parameter::aux::arg_list<boost::heap::comparelogalloc::segment_occupancy_descending_less_compare, boost::parameter::aux::empty_arg_list> > > >::type> = {<boost::heap::detail::heap_base<logalloc::segment*, logalloc::segment_occupancy_descending_less_compare, false, unsigned long, false>> = {logalloc::segment_occupancy_descending_less_compare = {}, <boost::heap::detail::size_holder<false, unsigned long>> = {
static constant_time_size = }, static is_stable = }, <logalloc::prepared_buffers_allocator<boost::heap::detail::parent_pointing_heap_nodelogalloc::segment* >> = {
static prepared_buffer = 0x0}, }, static constant_time_size = , static has_ordered_iterators = , static is_mergable = , static is_stable = ,
static has_reserve = , top_element = 0x613000533188,
trees = {<boost::intrusive::list_impl<boost::intrusive::bhtraits<boost::heap::detail::heap_node_base, boost::intrusive::list_node_traits<void*>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1u>, unsigned long, true, void>> = {static constant_time_size = true, static stateful_value_traits = , static has_container_from_iterator = , static safemode_or_autounlink = ,
data
= {<boost::intrusive::bhtraits<boost::heap::detail::heap_node_base, boost::intrusive::list_node_traits<void*>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1u>> = {<boost::intrusive::bhtraits_base<boost::heap::detail::heap_node_base, boost::intrusive::list_node<void*>, boost::intrusive::dft_tag, 1u>> = {}, static link_mode = },
root_plus_size_ = {<boost::intrusive::detail::size_holder<true, unsigned long, void>> = {static constant_time_size = , size_ = 8}, m_header = {<boost::intrusive::list_node<void
>> = {next_ = 0x61300043cdd0,
prev_ = 0x61300058d5a0}, }}}}, }}, _closed_occupancy = {_free_space = 320757853, _total_space = 3969908736}, _non_lsa_occupancy = {_free_space = 0, _total_space = 0},
_evictable_space = 0, _reclaiming_enabled = false, _evictable = true, _id = 58, _reclaim_counter = 6322853,
_eviction_fn = {<std::_Maybe_unary_or_binary_functionmemory::reclaiming_result> = {}, std::_Function_base = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7f702b3f9168,
_M_const_object = 0x7f702b3f9168, _M_function_pointer = 0x7f702b3f9168, _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7f702b3f9168, this adjustment 12547326},
_M_pod_data = "h\221?+p\177\000\000\376t\277\000\000\000\000"},
_M_manager = 0xa10900 <std::_Function_base::_Base_manager<cache_tracker::cache_tracker()::<lambda()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>},
_M_invoker = 0xa15060 <std::_Function_handler<memory::reclaiming_result(), cache_tracker::cache_tracker()::<lambda()> >::_M_invoke(const std::_Any_data &)>}, heap_handle = {node = 0x0}}

It's the same region_impl. In allocating_section, scylla has catch std::bad_alloc and call refill_emergency_reserve without reclaim_lock to compact and evict some segments.

So, we should not call abort() in allocate_segment, just return nullptr.

Thanks,
Frank

@tgrabiec

This comment has been minimized.

Copy link
Contributor

commented Oct 27, 2017

allocate_segment either has to succeed or throw an exception. It cannot return a nullptr.

The allocating section should temporarily disable the abort logic around the nested operation, and abort when the whole section gives up.

@frank8989

This comment has been minimized.

Copy link
Author

commented Oct 31, 2017

Got it. Thanks.

Thanks,
Frank

@slivne

This comment has been minimized.

Copy link
Contributor

commented Nov 12, 2017

@tgrabiec - is this fixed post 1.7.5 ?

@slivne slivne added this to the 2.1 milestone Nov 12, 2017

@tgrabiec

This comment has been minimized.

Copy link
Contributor

commented Nov 13, 2017

No. But note that this is only a problem if you have that abort on alloc failure flag enabled.

@slivne

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2017

It's likely that the operation would then succeed. The problem here is that --abort-on-lsa-bad-alloc is considering a non-fatal bad_alloc as fatal.

@slivne slivne modified the milestones: 2.1, 2.x Dec 21, 2017

@tgrabiec tgrabiec self-assigned this Feb 19, 2019

tgrabiec added a commit that referenced this issue Apr 17, 2019

lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.

We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.

Fixes #2924

Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit dafe22d)

amoskong pushed a commit to amoskong/scylla that referenced this issue May 8, 2019

lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.

We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.

Fixes scylladb#2924

Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit dafe22d)

tgrabiec added a commit that referenced this issue Aug 8, 2019

lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.

We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.

Fixes #2924

Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit dafe22d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.