New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node crashlooping in continuous_data_consumer
.
#6486
Comments
continuous_data_consumer
.continuous_data_consumer
.
Just tested: Wiping out the data in the node results in the crashloop ceasing. Going to try and binary search to find the culprit. |
Narrowed the culprit CF down via binary search. Very curiously, it's a CF with only one small (1MB) sstable. That sstable dumps just fine with Going to copy that single sstable to a one-node ring and see if the breakage follows. |
Curiously, scylla logged that it dumped its core, yet
|
can you please share |
Verified that |
lets try to force the setup to be according to scylla preference
and then run
|
So, I copied the CF that was causing the trouble to a one-node ring, and wasn't able to repro there. Going to try and repro again elsewhere. I think the reason for the missing coredump is insufficient space to store it on the root volume. |
@alienth the info to setup for scylla I listed above (#6486 (comment)) |
I'm able to repro again, but still no coredump (even when running the coredump setup that symlinks). Going to just force the kernel to coredump to non-systemd. |
so its clear once it setup correctly you should see something like
(the last part is validation cores are created correctly) |
Yea, everything validates yet I still can't get any cores generated. I also tried
to no avail. Something is hosed with coredumps on this box, I s'pose? Continuing to dig. |
Ah, default system ulimit on coredump was hampering me :P. Getting the coredump now. |
@alienth great (one small step ...) if you can please provide also the schema from that specific node and also the specific sstables if you can share them to upload the info please follow https://docs.scylladb.com/troubleshooting/report_scylla_problem/#send-files-to-scylladb-support |
A new node started crashing today on the same CF, so seems like this is definitely some data corruption bug that is somehow unique to that CF. It's a thrift-built CF, so no special schema. Unfortunately I can't upload the sstables without legal clearance, so going to try and troubleshoot myself. I've resolved the backtrace, which can be found here: https://gist.github.com/alienth/2f3d0d486255f432ae2778a2cd510028 |
backterace
|
@alienth please check if you have any files named *Index.db that are 4GB in size or larger. |
@tgrabiec The node crashes as soon as it starts and I think before it's accepting read queries (haven't confirmed that, though). The queries that hit this CF are just standard thrift |
A few more details: The CF contains a single partition with about 80k columns, all of which have TTLs. As such, the row is constantly rotating out old columns and having new ones written. The single partition is read a couple hundred times a second, with each Row cache is enabled. The CF was working fine for a few days before the crashing began. Removing the sstables on the crashing node and restarting it fixed it. A day later another node started doing the same. When that occurred I just failed back to C*, which has the same CF in place. |
@alienth I understood that the crash is during a read from one of your app's tables. Scylla doesn't read from user tables on its own during boot so it must be some external query triggering this. What's the schema of the CF involved in the crash? To get to the root cause we may need the core dump. |
Hi @tgrabiec , It's just a thrift-built table, so schema is
Yeah, kinda what I suspected. I'm trying to pry open the core dump myself but hitting a separate tooling issue. Hoping once I can get that resolved I can pick the core dump apart to find the trouble.
Unfortunately I can't share the core dump without getting legal clearance. |
On Tue, May 26, 2020 at 8:52 PM Jason Harvey ***@***.***> wrote:
Hi @tgrabiec <https://github.com/tgrabiec> ,
It's just a thrift-built table, so schema is
CREATE TABLE keyspace."ActiveForFrontPage" (
key ascii,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'LeveledCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 10800
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
To get to the root cause we may need the core dump.
Yeah, kinda what I suspected. I'm trying to pry open the core dump myself
but hitting a separate tooling issue. Hoping once I can get that resolved I
can pick the core dump apart to find the trouble.
(gdb) source /opt/scylladb/scripts/scylla-gdb.py
Traceback (most recent call last):
File "/opt/scylladb/scripts/scylla-gdb.py", line 43, in <module>
class intrusive_list:
File "/opt/scylladb/scripts/scylla-gdb.py", line 44, in intrusive_list
size_t = gdb.lookup_type('size_t')
gdb.error: No type named size_t.
This usually means that gdb didn't load debug info properly. Maybe you
don't have the debug info package installed?
… |
Yeah, it's definitely installed. Likely something mucked up in my environment elsewhere. I'll update once I figure it out. |
@alienth Note that relocatable binaries may need special treatment do debug: https://github.com/scylladb/scylla/blob/master/docs/debugging.md#relocatable-binaries |
I think it might be related to the binutils mess on Ubuntu:
Nearly have it figured out, I think. |
This could be https://github.com/scylladb/scylla/blob/master/docs/debugging.md#namespace-issues. |
@tgrabiec any idea? This is the first (and only) partition, so it can't be related to mis-clearing when moving from one partition to the next. |
@avikivity promoted index length is varint so |
Ah, I thought varint was little endian. |
Mm, so index is looking fine, then? If that's the case, any recommended path I can take on the GDB side of things with the coredump? |
You can instrument the code to print end and _stream_position, and we can work backwards from there. |
That is if you're comfortable with modifying and building the code. With gdb+core you can walk up the frame and look for variables the optimizer forgot to destroy. |
I was able to repro this again after completely truncating all data and rebuilding it. Going to modify the code to try and diagnose the issue. |
Got some printing.
|
Tracing logs from the shard which crashed:
|
Interestingly, on a subsequent restart the sstable got compacted down, and it crashed again on the newly compacted sstable. Clustering position is different, too:
|
I tcpdump'd and flipped on reads to this ring from a single source and was able to repro the crash, however the query which triggered the crash I was able to repeat after restarting without a problem.. so, it may be some bug which only arises after some number of reads? The read itself is uninteresting - a |
Based on the trace I'm thinking the promoted index is pointing to a location in the file that doesn't exist? Going to try parsing the index and data file more carefully to see if that's the case. |
@alienth - if the stable doesn';t have a lot of data - can you try and overwrite the data with fake data - check the compacted sstable still crashed and shared with us the sstable with the faked data. |
Attached: I'm still hacking away at the files to see if I can find the issue. Are these sstables binary-compatible with the same Edit: Also, I can confirm that scylla does crash with that sstable.
|
Yes, those sstables are fully compatible. Scylla adds a -Scylla.db component, but Cassandra ignores it. |
/cc @tgrabiec please take a look |
The assert fails because the position in the data file for the upper bound of the read is smaller than the position for the lower bound of the read. This is because the read uses lower bound which is larger than the upper bound. In one of the traces the column name range is ["t4d0m", "2veze"]. Are you maybe passing the column names to Looking at It doesn't explain why you don't get the same crash for the same query after restarting the server. |
AH, yeah, we don't sort columns on the call, and the library we're using also does not: Knowing this I'll try to intentionally request unsorted columns via thrift to see if I can get it to crash. Something interesting about this is that when blowing away the data and restarting, things work fine for a while before the crashes resume. Unsure why that'd be 🤔 . |
It took several tries for whatever reason, but I was able to repro the crash by issuing a Edit: I'm doing a CL_ONE call on a ring with an RF of 3 - looks like it only crashes when the query happens to go to one specific node. That seems rather odd - I'm unclear why the other nodes would be able to handle this query successfully? |
@alienth Can you paste the log with a trace-level enabled on the sstable logger for a query which passes? One hypothesis I have is that the nodes which can handle the read have the data fully in cache, which is not as prone to out of order ranges. |
The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit bfd129c)
The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit bfd129c)
The column names in SlicePredicate can be passed in arbitrary order. We converted them to clustering ranges in read_command preserving the original order. As a result, the clustering ranges in read command may appear out of order. This violates storage engine's assumptions and lead to undefined behavior. It was seen manifesting as a SIGSEGV or an abort in sstable reader when executing a get_slice() thrift verb: scylla: sstables/consumer.hh:476: seastar::future<> data_consumer::continuous_data_consumer<StateProcessor>::fast_forward_to(size_t, size_t) [with StateProcessor = sstables::data_consume_rows_context_m; size_t = long unsigned int]: Assertion `end >= _stream_position.position' failed. Fixes #6486. Tests: - added a new dtest to thrift_tests.py which reproduces the problem Message-Id: <1596725657-15802-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit bfd129c)
Bacported to 4.0, 4.1, 4.2. |
This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
Installation details
Scylla version (or git commit hash): 3.3.1, 6f939ff
Cluster size: 9
OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu
Encountered the following crash on Scylla. Not sure if it matters, but this ring is primarily serving Thrift requests.
No core dump was produced.
When restarting the node, it crashes again, so guessing there is some sstable triggering this.
Seems like this might have a similar symptoms to #4315?
The text was updated successfully, but these errors were encountered: