Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump/Crash during repair #4363

Closed
r4fek opened this issue Mar 25, 2019 · 37 comments

Comments

Projects
None yet
10 participants
@r4fek
Copy link

commented Mar 25, 2019

Installation details
Scylla version (or git commit hash): 3.0.4-0.20190313.5e3a52024-1~jessie
Cluster size: 20 nodes, 2 DC
OS (RHEL/CentOS/Ubuntu/AWS AMI): Debian

Backtrace:

(gdb) bt
#0  0x00007f6ca15e4067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f6ca15e5523 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f6ca15dd266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f6ca15dd312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000003a07003 in CryptoPP::SecBlock<unsigned int, CryptoPP::FixedSizeAllocatorWithCleanup<unsigned int, 16ul, CryptoPP::NullAllocator<unsigned int>, true> >::~SecBlock (this=0x3705, 
    __in_chrg=<optimized out>) at /usr/include/cryptopp/secblock.h:261
#5  CryptoPP::FixedSizeSecBlock<unsigned int, 16u, CryptoPP::FixedSizeAllocatorWithCleanup<unsigned int, 16ul, CryptoPP::NullAllocator<unsigned int>, true> >::~FixedSizeSecBlock (this=0x3705, 
    __in_chrg=<optimized out>) at /usr/include/cryptopp/secblock.h:422
#6  CryptoPP::FixedSizeAlignedSecBlock<unsigned int, 16u, true>::~FixedSizeAlignedSecBlock (
    this=0x3705, __in_chrg=<optimized out>) at /usr/include/cryptopp/secblock.h:429
#7  CryptoPP::IteratedHashWithStaticTransform<unsigned int, CryptoPP::EnumToType<CryptoPP::ByteOrder, 1>, 64u, 32u, CryptoPP::SHA256, 32u, true>::IteratedHashWithStaticTransform (this=0x7f6c9c9d8bc8)
    at /usr/include/cryptopp/iterhash.h:88
#8  CryptoPP::SHA256::SHA256 (this=0x7f6c9c9d8bc8) at /usr/include/cryptopp/sha.h:20
#9  sha256_hasher::sha256_hasher (this=0x7f6c9c9d8bc8) at repair/repair.cc:454
#10 partition_checksum::<lambda(auto:145&, auto:146&)>::<lambda()>::<lambda(auto:147)>::operator()<seastar::optimized_optional<mutation> > (mopt=..., __closure=<optimized out>) at repair/repair.cc:604
#11 seastar::apply_helper<partition_checksum::compute_legacy(flat_mutation_reader)::<lambda(auto:145&, auto:146&)>::<lambda()> [with auto:145 = flat_mutation_reader; auto:146 = partition_checksum]::<lambda(auto:147)>, std::tuple<seastar::optimized_optional<mutation> >&&, std::integer_sequence<long unsigned int, 0> >::apply (args=<optimized out>, func=<optimized out>) at ./seastar/core/apply.hh:35
#12 seastar::apply<partition_checksum::compute_legacy(flat_mutation_reader)::<lambda(auto:145&, auto:146&)>::<lambda()> [with auto:145 = flat_mutation_reader; auto:146 = partition_checksum]::<lambda(auto:147)---Type <return> to continue, or q <return> to quit---
>, seastar::optimized_optional<mutation> > (args=<optimized out>, func=<optimized out>)
    at ./seastar/core/apply.hh:43
#13 seastar::futurize<seastar::bool_class<seastar::stop_iteration_tag> >::apply<partition_checksum::compute_legacy(flat_mutation_reader)::<lambda(auto:145&, auto:146&)>::<lambda()> [with auto:145 = flat_mutation_reader; auto:146 = partition_checksum]::<lambda(auto:147)>, seastar::optimized_optional<mutation> >
    (args=<optimized out>, func=<optimized out>) at ./seastar/core/future.hh:1309
#14 seastar::future<seastar::optimized_optional<mutation> >::then<partition_checksum::compute_legacy(flat_mutation_reader)::<lambda(auto:145&, auto:146&)>::<lambda()> [with auto:145 = flat_mutation_reader; auto:146 = partition_checksum]::<lambda(auto:147)> > (func=<optimized out>, this=0x7f6c9c9d8ae8)
    at ./seastar/core/future.hh:952
#15 partition_checksum::<lambda(auto:145&, auto:146&)>::<lambda()>::operator()(void) const (
    __closure=<optimized out>) at repair/repair.cc:610
#16 0x0000000000000000 in ?? ()

Logs:

mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 28] repair - repair 574 on shard 28 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 15] repair - repair 574 on shard 15 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 21] repair - repair 574 on shard 21 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 1] repair - repair 574 on shard 1 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 39] repair - repair 574 on shard 39 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 26] repair - repair 574 on shard 26 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 43] repair - repair 574 on shard 43 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 18] repair - repair 574 on shard 18 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 27] repair - Start streaming for repair id=574, shard=27, index=0, ranges_in=1, ranges_out=2
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 27] stream_session - [Stream #8a371740-4eee-11e9-bdb8-00000000001c] Executing streaming plan for repair-in-id-574-shard-27-index-0 with peers={10.210.3.84}, master
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 11] repair - repair 574 on shard 11 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 3] repair - repair 574 on shard 3 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 7] repair - repair 574 on shard 7 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 5] repair - repair 574 on shard 5 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 22] repair - repair 574 on shard 22 completed successfully
mar 25 11:10:03 ams-dbs10 scylla[14085]: [shard 16] repair - repair 574 on shard 16 completed successfully
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 31] repair - repair 574 on shard 31 completed successfully
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 13] repair - repair 574 on shard 13 completed successfully
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 12] repair - Start streaming for repair id=574, shard=12, index=0, ranges_in=1, ranges_out=2
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 12] stream_session - [Stream #8a4288f0-4eee-11e9-8b62-00000000000b] Executing streaming plan for repair-in-id-574-shard-12-index-0 with peers={10.210.3.84}, master
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 2] repair - repair 574 on shard 2 completed successfully
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 9] repair - Checksum of range [-8281238575676813133, -8281214099591881772] on 10.210.20.84 failed: std::out_of_range (Buffer underflow)
mar 25 11:10:04 ams-dbs10 scylla[14085]: [shard 9] repair - Checksum of range [-8281238575676813133, -8281214099591881772] on 10.210.20.122 failed: std::runtime_error (Buffer underflow)
mar 25 11:10:04 ams-dbs10 scylla[14085]: scylla: ./seastar/core/future.hh:903: void seastar::future<T>::do_wait() [with T = {partition_checksum}]: Assertion `thread' failed.
mar 25 11:10:04 ams-dbs10 scylla[14085]: Aborting on shard 9.
mar 25 11:10:04 ams-dbs10 scylla[14085]: Backtrace:
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000006ab1e2
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000005c3d6c
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000005c4015
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000005c40c3
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libpthread.so.0+0x000000000000f88f
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libc.so.6+0x0000000000035066
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libc.so.6+0x0000000000036447
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libc.so.6+0x000000000002e265
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libc.so.6+0x000000000002e311
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x0000000003a07002
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x0000000003a26ef7
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000005a511e
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000005a52e1
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x00000000006732b7
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x000000000069619a
mar 25 11:10:04 ams-dbs10 scylla[14085]: 0x0000000000759fad
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libpthread.so.0+0x0000000000008063
mar 25 11:10:04 ams-dbs10 scylla[14085]: /lib/x86_64-linux-gnu/libc.so.6+0x00000000000e862c
mar 25 11:16:20 ams-dbs10 systemd[1]: scylla-server.service: main process exited, code=killed, status=6/ABRT
mar 25 11:16:22 ams-dbs10 systemd[1]: Unit scylla-server.service entered failed state.
mar 25 11:16:22 ams-dbs10 systemd[1]: Starting Scylla Server...
mar 25 11:16:23 ams-dbs10 systemd[1]: scylla-server.service holdoff time over, scheduling restart.

It happens while performing repair.

@r4fek r4fek changed the title Coredump during repair Coredump/Crash during repair Mar 25, 2019

@glommer

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

@asias

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

@nyh do you have any ideas about the CryptoPP assert, alignment issue?

@syuu1228 can you provide a copy of /usr/include/cryptopp/secblock.h for the build: 3.0.4-0.20190313.5e3a52024-1~jessie. My local /usr/include/cryptopp/secblock.h seems different.

@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

@asias, could the backtrace be bogus? The log says

scylla: ./seastar/core/future.hh:903: void seastar::future<T>::do_wait() [with T = {partition_checksum}]: Assertion `thread' failed.
mar 25 11:10:04 ams-dbs10 scylla[14085]: Aborting on shard 9.

So I would expect that that's the assert that failed: some code tried to get() the future partition_checksum, without being in thread context. I don't see how an actual assertion in CryptoCPP code, as suggested by this backtrace, could lead to the above log message...
But I'll look again at the code to see if I can figure out something else.

@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

If I continue to ignore the backtrace and use only the log message and assume the problem is get() on an unready partition_checksum future, I see in repair.cc only two places where checksums[i].get0() is called. But it is called only as a continuation to when_all(checksums.begin(), checksums.end()).

Could we have a bug in when_all()?

We know we do have a bug in when_all, when running out of memory it may not wait for all the futures (see scylladb/seastar#135). do you know if this node had any OOM issues?
Or, could we possibly have another bug in when_all, which, for example, if some of the futures are exceptional (as happened here - according to the log), maybe it forgets to wait for some of the other futures?

@r4fek

This comment has been minimized.

Copy link
Author

commented Mar 26, 2019

No OOM issues on this node. Issue is fully reproducible. Every time I ran nodetool repair sync user_quota -pr -local it crashes.

@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

Are the logs similar every time, giving reporting some weird "Buffer underflow" in retrieving checksums from other nodes, and then an assertion failure on do_wait()?

@r4fek

This comment has been minimized.

Copy link
Author

commented Mar 26, 2019

Logs are similar. From today's run:

mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 7] repair - Repair 10 out of 471 ranges, id=1, shard=7, keyspace=sync, table={user_quota}, range=(-8956230552571669594, -8950787210158098972]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 27] repair - Repair 23 out of 471 ranges, id=1, shard=27, keyspace=sync, table={user_quota}, range=(-8267994088275915936, -8266775682418817826]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 27] repair - Repair 24 out of 471 ranges, id=1, shard=27, keyspace=sync, table={user_quota}, range=(-8129179959635555520, -8128898703648709757]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 27] repair - Repair 25 out of 471 ranges, id=1, shard=27, keyspace=sync, table={user_quota}, range=(-8083138542065621398, -8081701613004927367]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 27] repair - Repair 26 out of 471 ranges, id=1, shard=27, keyspace=sync, table={user_quota}, range=(-8017918586081434107, -8017271889942484528]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 27] repair - Repair 27 out of 471 ranges, id=1, shard=27, keyspace=sync, table={user_quota}, range=(-7982944990706867679, -7972657383343631192]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 15] repair - Repair 24 out of 471 ranges, id=1, shard=15, keyspace=sync, table={user_quota}, range=(-8129179959635555520, -8128898703648709757]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 15] repair - Repair 25 out of 471 ranges, id=1, shard=15, keyspace=sync, table={user_quota}, range=(-8083138542065621398, -8081701613004927367]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 4] repair - Repair 22 out of 471 ranges, id=1, shard=4, keyspace=sync, table={user_quota}, range=(-8279032008674959252, -8267994088275915936]
mar 26 08:55:44 ams-dbs10 scylla[20936]: [shard 12] repair - Repair 20 out of 471 ranges, id=1, shard=12, keyspace=sync, table={user_quota}, range=(-8289925662141070481, -8282805927186275898]
mar 26 08:55:45 ams-dbs10 scylla[20936]: [shard 9] repair - Checksum of range [-8281238575676813133, -8281214099591881772] on 10.210.20.84 failed: std::out_of_range (Buffer underflow)
mar 26 08:55:45 ams-dbs10 scylla[20936]: [shard 9] repair - Checksum of range [-8281238575676813133, -8281214099591881772] on 10.210.20.122 failed: std::runtime_error (Buffer underflow)
mar 26 08:58:19 ams-dbs10 systemd[1]: scylla-server.service: main process exited, code=killed, status=6/ABRT
mar 26 08:58:19 ams-dbs10 systemd[1]: Unit scylla-server.service entered failed state.
mar 26 08:58:19 ams-dbs10 systemd[1]: Starting Scylla Server...
mar 26 08:58:20 ams-dbs10 systemd[1]: scylla-server.service holdoff time over, scheduling restart.
mar 26 08:58:20 ams-dbs10 systemd[1]: Stopping Scylla Server...
mar 26 08:58:20 ams-dbs10 systemd[1]: Starting Scylla Server...
@r4fek

This comment has been minimized.

Copy link
Author

commented Mar 26, 2019

Backtrace:

(gdb) bt
#0  0x00007f4b17abd067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f4b17abe523 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f4b17ab6266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f4b17ab6312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6

#4  0x0000000003a18773 in seastar::future<partition_checksum>::do_wait (this=<optimized out>) at ./seastar/core/future.hh:903
#5  seastar::future<partition_checksum>::get (this=<optimized out>) at ./seastar/core/future.hh:852
#6  seastar::future<partition_checksum>::get0 (this=<optimized out>) at ./seastar/core/future.hh:873
#7  <lambda(uint64_t)>::<lambda(auto:157&, auto:158&, const auto:159&, auto:160&)>::<lambda()>::<lambda(auto:161)>::<lambda(std::vector<seastar::future<partition_checksum>, std::allocator<seastar::future<partition_checksum> > >)>::operator()(std::vector<seastar::future<partition_checksum>, std::allocator<seastar::future<partition_checksum> > >) const (__closure=0x0, checksums=...) at repair/repair.cc:831
#8  0x0000000000000000 in ?? ()
(gdb) 
#0  0x00007f4b17abd067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f4b17abe523 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f4b17ab6266 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f4b17ab6312 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000003a18773 in seastar::future<partition_checksum>::do_wait (this=<optimized out>) at ./seastar/core/future.hh:903
#5  seastar::future<partition_checksum>::get (this=<optimized out>) at ./seastar/core/future.hh:852
#6  seastar::future<partition_checksum>::get0 (this=<optimized out>) at ./seastar/core/future.hh:873
#7  <lambda(uint64_t)>::<lambda(auto:157&, auto:158&, const auto:159&, auto:160&)>::<lambda()>::<lambda(auto:161)>::<lambda(std::vector<seastar::future<partition_checksum>, std::allocator<seastar::future<partition_checksum> > >)>::operator()(std::vector<seastar::future<partition_checksum>, std::allocator<seastar::future<partition_checksum> > >) const (__closure=0x0, checksums=...) at repair/repair.cc:831
#8  0x0000000000000000 in ?? ()
@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

Thanks! This latest backtrace makes more sense. As I suspected, it's get0() on a partition_checksum, according to this backtrace it is specifically the following line in repair.cc:

                            live_neighbors_checksum.push_back(checksums[i].get0());

But this called inside a continuation of

               when_all(checksums.begin(), checksums.end()).then(

So checksum[i] should have been a ready future, and its get0() should have called do_wait() at all.
I wonder if we have a bug in when_all, perhaps when some of the futures are exceptional.

A separate question is why we get those "underflow" checksum errors on the other nodes. This shouldn't happen as well. But even if it does, of course it shouldn't lead to an assertion failure.

@syuu1228

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

@slivne slivne added the repair label Mar 28, 2019

@slivne slivne added this to the 3.1 milestone Mar 28, 2019

@slivne slivne assigned asias and nyh Mar 28, 2019

@espindola

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Thanks! This latest backtrace makes more sense. As I suspected, it's get0() on a partition_checksum, according to this backtrace it is specifically the following line in repair.cc:

                            live_neighbors_checksum.push_back(checksums[i].get0());

But this called inside a continuation of

               when_all(checksums.begin(), checksums.end()).then(

So checksum[i] should have been a ready future, and its get0() should have called do_wait() at all.
I wonder if we have a bug in when_all, perhaps when some of the futures are exceptional.

Is this a lifetime issue? The original checksums vector doesn't seem to be saved anywhere.

at least in unit tests we seems to make sure the external vector is alive:

auto p = make_shared(std::move(futures));
return when_all(p->begin(), p->end()).then([p] (std::vector<future<size_t>> ret) {

We could add a when_all variant that takes a vector by value.

@asias

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

Indeed the std::vector<future<partition_checksum>> checksums looks suspicious and dangerous. @nyh any objections to make it lw_shared_ptr?

@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2019

@asias, @espindola the documentation of the iterator-based when_all() API unfortunately doesn't explain what happens with the iterator. When I wrote this code, I was assuming that when_all() begins by iterating through it all, moving the futures iterated into a new internal vector stored by when_all(), and only then deferring. If that were true, there is no reason why the original vector needs to live for long, on be in a shared_ptr.

(BTW, it's a shame we use an iterator here at all, and not an rvalue range which will more accurately reflect that the caller is not supposed to use this vector ever again).

It's quite possible that my assumption is no longer (or never was) true. But I can't still figure out why... It still seems to me from quick inspection of the code that do_when_all() really does move all the futures away from the (to be dead) vector into a new vector. So lw_shared_ptr or similar tricks should not be needed?

@nyh

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2019

(BTW I opened scylladb/seastar#622 for a clearer API for Seastar)

@r4fek

This comment has been minimized.

Copy link
Author

commented Apr 12, 2019

@asias @nyh any news on this one?

@nyh

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2019

According to @r4fek, "Issue is fully reproducible. Every time I ran nodetool repair sync user_quota -pr -local it crashes". The proper way forward in such cases to to create a dtest reproducing it, and then debugging. I'll try now to prepare a dtest for master, hoping I'll be able to reproduce it there too.

@nyh

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2019

I am not able to reproduce this issue... @r4fek, you mentioned nodetool repair sync user_quota -pr -local but this requires the "sync" keyspace to exist with a "user_quota" table, and I don't know what to put in it. We do have in dtest, repair_additional_test.py, a test called _repair_option_pr_dc_host_test, which does create a fairly elaborate setup and runs "nodetool repair -pr -local" on this table, and it seems to be still working fine and repairing exactly what this command needs to repair - not more and not less.

Maybe this bug needs a specific data set or a specific cluster setup (how many nodes? data centers? rf? etc.). Can you please help me try to reproduce this bug so I can debug it? Thanks.

@r4fek

This comment has been minimized.

Copy link
Author

commented Apr 15, 2019

@nyh

CREATE KEYSPACE sync WITH replication = {'class': 'NetworkTopologyStrategy', 'Amsterdam': '3'}  AND durable_writes = true;
CREATE TABLE sync.user_quota (
    user_id text PRIMARY KEY,
    entities counter
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'LeveledCompactionStrategy', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2'}
    AND compression = {}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 432000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

It might not be related but for really small subset of my users I always get this error while trying to fetch entry from this table:

Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for sync.user_quota - received 0 responses and 2 failures from 2 CL=LOCAL_QUORUM." info={'consistency': 'LOCAL_QUORUM', 'required_responses': 2, 'received_responses': 0, 'failures': 2}
@slivne

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

@r4fek - we just noticed you do not have a clustering key but are using counters ... this may help I hope.

Lets try and narrow this down to the specific users you have issues with, lets find their sstables first

You can use the rest api to find the files

select * from sync.user_quota ;

 user_id | entities
---------+----------
    1234 |        1
curl -X GET "http://localhost:10000/column_family/sstables/by_key/sync%3Auser_quota?key=1234"

returns an array with sstable ids

["/var/lib/scylla/data/sync/user_quota-3fd19b6061bf11e986c1000000000001/mc-5-big-Data.db"]

now we can move those sstables to a 2 node cluster / single ndoe and check if selects works / nodetool repair works.

If it doesn't it may be an issue with the files - it will help to get the files if possible and if not an sstabledump of those files.

@r4fek

This comment has been minimized.

Copy link
Author

commented Apr 23, 2019

Also when I try to run create query for problematic user:

UserQuota.objects.create(user_id='453791802')

I get in response:

ServerError: <Error from server: code=0000 [Server error] message="Buffer underflow">

Why do you think adding clustering key will help in my case @slivne?

@r4fek

This comment has been minimized.

Copy link
Author

commented May 17, 2019

Firing curl -X GET "http://localhost:10000/column_family/sstables/by_key/sync%3Auser_quota?key=1234" crashed 2 of my nodes...

@r4fek

This comment has been minimized.

Copy link
Author

commented May 17, 2019

Report UUID is 31659858-7774-4972-bb57-812aeda8006d. I sent there sstables that could be problematic (node wasn't able to compact them).

maj 17 10:46:27 ams-dbs3 scylla[17851]:  [shard 9] compaction - Compacting [/var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257011-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257149-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257287-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256643-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257241-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257103-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257057-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256689-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256919-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256735-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257195-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256827-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256965-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-257333-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-256873-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-251675-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-252181-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-252227-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-97069-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-101071-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-115423-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-126555-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-100749-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-198269-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-211057-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-167265-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-119011-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-211977-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-108523-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-101485-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-167311-big-Data.db:level=0, /var/lib/scylla/data/sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-202179-big-Data.db:level=0, ]
maj 17 10:46:27 ams-dbs3 scylla[17851]:  [shard 9] compaction_manager - compaction failed: std::out_of_range (Buffer underflow)
maj 17 10:46:27 ams-dbs3 scylla[17851]:  [shard 9] compaction_manager - compaction task handler sleeping for 5 seconds
@slivne

This comment has been minimized.

Copy link
Contributor

commented May 19, 2019

@bhalevy - these are mc format sstables with counters that are not being compacted - not sure if they have been originally created with mc or have been converted to mc, @haaawk it may make sense for you to check them first.

@r4fek

This comment has been minimized.

Copy link
Author

commented May 20, 2019

@slivne they have been converted to mc AFAIK.

@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

sstabledump reades each of those sstables with no problems but I was able to reproduce the problem by starting scylla with those sstables:

INFO  2019-05-20 12:24:32,152 [shard 1] compaction - Compacted 2 sstables to [tmp/system/peers-37f71aca7dc2383ba70672528af04d4f/mc-28-big-Data.db:level=0, ]. 10586 bytes to 5756 (~54% of original) in 59ms = 0.09MB/s. ~256 total partitions merged to 1.
INFO  2019-05-20 12:24:35,796 [shard 0] compaction - Compacting [tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257011-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257287-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257149-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256643-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257241-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257103-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257057-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256689-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256919-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256735-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257195-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256827-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256965-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-257333-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-256873-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-251675-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-252181-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-252227-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-97069-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-101071-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-115423-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-100749-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-198269-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-211057-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-167265-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-119011-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-211977-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-101485-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-108523-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-167311-big-Data.db:level=0, tmp/sync/user_quota-d7ffe7d07ae611e989d9000000000000/mc-202179-big-Data.db:level=0, ]
ERROR 2019-05-20 12:24:35,800 [shard 0] compaction_manager - compaction failed: std::out_of_range (Buffer underflow)

@slivne

This comment has been minimized.

Copy link
Contributor

commented May 20, 2019

@r4fek - thanks - just validating the files have been moved into this scylla cluster from a cassandra cluster or have they been written fully (using CQL) into the scylla cluster.

@r4fek

This comment has been minimized.

Copy link
Author

commented May 20, 2019

They have been written fully to Scylla (2.3 and then to 3.0.5).

@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

Here's a backtrace:

#0  0x00007ffff7e02581 in __cxa_throw () from /lib64/libstdc++.so.6
#1  0x0000000000546892 in data_input::ensure (s=2, this=<synthetic pointer>) at /usr/include/c++/8/ext/new_allocator.h:86
#2  data_input::peek_primitive<short> (this=<synthetic pointer>) at ./utils/data_input.hh:95
#3  data_input::peek<short> (this=<synthetic pointer>) at ./utils/data_input.hh:136
#4  data_input::read<short> (this=<synthetic pointer>) at ./utils/data_input.hh:139
#5  sstables::make_counter_cell (timestamp=1553245276493556, value=...) at sstables/mp_row_consumer.cc:49
#6  0x0000000001be54cf in sstables::mp_row_consumer_m::consume_counter_column (this=0x600001d28a58, column_info=..., value="", timestamp=<optimized out>) at sstables/mp_row_consumer.hh:1211
#7  0x0000000001be199b in sstables::data_consume_rows_context_m::do_process_state (this=this@entry=0x600001d36080, data=...) at sstables/row.hh:731
#8  0x0000000001bfb920 in sstables::data_consume_rows_context_m::process_state (data=..., this=0x600001d36080) at sstables/row.hh:790
#9  data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::process (this=this@entry=0x600001d36080, data=...) at sstables/consumer.hh:396
#10 0x0000000001bff503 in data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::operator() (data=..., this=0x600001d36080) at sstables/consumer.hh:406
#11 std::__invoke_impl<seastar::future<seastar::consumption_result<char> >, sstables::data_consume_rows_context_m&, seastar::temporary_buffer<char> > (__f=...) at /usr/include/c++/8/bits/invoke.h:60
#12 std::__invoke<sstables::data_consume_rows_context_m&, seastar::temporary_buffer<char> > (__fn=...) at /usr/include/c++/8/bits/invoke.h:96
#13 std::reference_wrapper<sstables::data_consume_rows_context_m>::operator()<seastar::temporary_buffer<char> > (this=0x7fffffffc660) at /usr/include/c++/8/bits/refwrap.h:319
#14 seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> >(std::reference_wrapper<sstables::data_consume_rows_context_m>&&)::{lambda()#1}::operator()() (this=0x600001d36080) at ./seastar/core/iostream-impl.hh:227
Missing separate debuginfos, use: dnf debuginfo-install boost-date-time-1.66.0-14.fc29.x86_64 boost-filesystem-1.66.0-14.fc29.x86_64 boost-program-options-1.66.0-14.fc29.x86_64 boost-system-1.66.0-14.fc29.x86_64 boost-thread-1.66.0-14.fc29.x86_64 cryptopp-7.0.0-2.fc29.x86_64 gmp-6.1.2-8.fc29.x86_64 hwloc-libs-1.11.9-2.fc29.x86_64 jsoncpp-1.8.4-5.fc29.x86_64 libatomic-8.2.1-6.fc29.x86_64 libblkid-2.32.1-1.fc29.x86_64 libcap-2.25-12.fc29.x86_64 libffi-3.1-18.fc29.x86_64 libgcrypt-1.8.4-1.fc29.x86_64 libgpg-error-1.31-2.fc29.x86_64 libmount-2.32.1-1.fc29.x86_64 libselinux-2.8-4.fc29.x86_64 libstdc++-8.2.1-6.fc29.x86_64 libtasn1-4.13-5.fc29.x86_64 libtool-ltdl-2.4.6-27.fc29.x86_64 libunistring-0.9.10-4.fc29.x86_64 libuuid-2.32.1-1.fc29.x86_64 libxcrypt-4.4.2-3.fc29.x86_64 libxml2-2.9.8-4.fc29.x86_64 lz4-libs-1.8.3-1.fc29.x86_64 nettle-3.4.1rc1-1.fc29.x86_64 numactl-libs-2.0.12-1.fc29.x86_64 openssl-libs-1.1.1a-1.fc29.x86_64 p11-kit-0.23.14-1.fc29.x86_64 snappy-1.1.7-6.fc29.x86_64 systemd-libs-239-7.git3bf819c.fc29.x86_64 thrift-0.10.0-15.fc29.x86_64 yaml-cpp-0.6.1-4.fc29.x86_64 zlib-1.2.11-14.fc29.x86_64
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x0000000001c0055f in seastar::repeat<seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> >(std::reference_wrapper<sstables::data_consume_rows_context_m>&&)::{lambda()#1}>(seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> >(std::reference_wrapper<sstables::data_consume_rows_context_m>&&)::{lambda()#1}) (action=...) at ./seastar/core/preempt.hh:33
#16 0x0000000001c368fb in seastar::input_stream<char>::consume<std::reference_wrapper<sstables::data_consume_rows_context_m> > (consumer=..., this=<optimized out>) at ./seastar/core/iostream-impl.hh:199
#17 seastar::input_stream<char>::consume<sstables::data_consume_rows_context_m> (consumer=..., this=<optimized out>) at ./seastar/core/iostream-impl.hh:236
#18 data_consumer::continuous_data_consumer<sstables::data_consume_rows_context_m>::consume_input (this=<optimized out>) at sstables/consumer.hh:369
#19 sstables::data_consume_context<sstables::data_consume_rows_context_m>::read (this=0x600001d28d88) at sstables/data_consume_context.hh:104
#20 sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const (this=<optimized out>) at sstables/partition.cc:474
#21 seastar::apply_helper<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}) (args=..., func=...) at ./seastar/core/apply.hh:35
#22 seastar::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...) at ./seastar/core/apply.hh:43
#23 seastar::futurize<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...) at ./seastar/core/future.hh:1389
#24 seastar::future<>::then<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}&&) (func=..., this=0x7fffffffc720) at ./seastar/core/future.hh:952
#25 sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}::operator()() const (
    this=0x600001d28a00) at sstables/partition.cc:475
#26 0x0000000001c528ce in seastar::do_void_futurize_helper<seastar::future<> >::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}&>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}&) (func=...) at ./seastar/core/future.hh:1381
#27 seastar::futurize<void>::apply<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}&>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}&) (func=...)
    at ./seastar/core/future.hh:1382
#28 seastar::do_until<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#1}>(sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#1}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const::{lambda()#2}) (action=..., stop_cond=...) at ./seastar/core/future-util.hh:502
#29 sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}::operator()() const (this=<optimized out>)
    at sstables/partition.cc:476
#30 seastar::internal::do_until_state<sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#2}, sstables::sstable_mutation_reader<sstables::data_consume_rows_context_m, sstables::mp_row_consumer_m>::fill_buffer(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >)::{lambda()#3}>::run_and_dispose() (this=0x600001aa6eb0) at ./seastar/core/future-util.hh:462
#31 0x000000000098fda8 in seastar::reactor::run_tasks (this=this@entry=0x600000020000, tq=...) at core/reactor.cc:2695
#32 0x000000000098ff8f in seastar::reactor::run_some_tasks (this=this@entry=0x600000020000) at core/reactor.cc:3118
#33 0x0000000000a47686 in seastar::reactor::run_some_tasks (this=0x600000020000) at core/reactor.cc:3265
#34 seastar::reactor::run (this=0x600000020000) at core/reactor.cc:3265
#35 0x0000000000b57ba8 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffd290, ac=ac@entry=15, av=av@entry=0x7fffffffd5c8, func=...) at /home/haaawk/haaawk/scylla/seastar/core/reactor.hh:1192
#36 0x00000000009233fa in main (ac=15, av=0x7fffffffd5c8) at /usr/include/c++/8/bits/std_function.h:87
@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

The exception is thrown while reading this sstable: sync/user_quota-23a15d7072f211e8a47f00000000000c/mc-97069-big-Data.db

@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

I believe this is the problem:

{
    "partition" : {
      "key" : [ "429647821" ],
      "position" : 454
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 486,
        "cells" : [
          { "name" : "entities", "value" : "", "tstamp" : "2019-03-22T09:01:16.493556Z" }
        ]
      }
    ]
  },

the value of the counter column is empty and our code does not expect this. @pdziepak is it correct for a counter cell to have an empty value?

@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

It seems that a counter value expects at least a 2 byte header.

@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

It seems that it is possible to write a counter cell without value. See sstables/mc/writer.cc write_cell for more details. I will make read path handle such case without crashing and see if it helps in this case.

@pdziepak

This comment has been minimized.

Copy link
Member

commented May 20, 2019

It is most likely a bug in sstables/mc/writer.cc that it ever writes to a sstable counter value without the header. However, this happens only if there is a live counter cell without any shards and I can't think of any way in which this could legitimately happen.

@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

Should I fix the writer as well? We have to make read path handle this anyway since there may be existing sstables with counters written without headers.

@pdziepak

This comment has been minimized.

Copy link
Member

commented May 20, 2019

All right, counter_cell_view::difference can produce counter cell without any shards. This is good since it (most likely) means that we don't have a more complicated bug lurking there.

In this case the right course of action seems to be:

  • fix the MC writer to always include the header
  • since we have already written counter cells without the header we need to accept them, so the read path needs adjustment as well
@haaawk

This comment has been minimized.

Copy link
Member

commented May 20, 2019

I fixed read path and it helped with the sstables we're talking about here. I will fix the writer and send a patchset. Thanks for you help @pdziepak!

avikivity added a commit that referenced this issue May 23, 2019

Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path

(cherry picked from commit 899ebe4)
@haaawk

This comment has been minimized.

Copy link
Member

commented May 23, 2019

It would be great to get this backported to branch-3.0

avikivity added a commit that referenced this issue May 23, 2019

Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path

(cherry picked from commit 899ebe4)

avikivity added a commit that referenced this issue May 24, 2019

Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path

(cherry picked from commit 899ebe4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.