New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a core dump on all nodes #9933
Comments
where is the coredump ? The backtrace is not clear please extract the stacktrace and backtrace in a clean format not from graphana - its hard to parse (maybe you have a trick todo that) |
OK
The issue is memory allocation I think we may have an issue with the alter done
if the chunk_length_kb 1M is actually translated to 1*10^6then its clear we will have memory pressure. Is this a new nemesis ? Please change this value and have it as |
@roydahan / @yarongilor ^^ |
@slivne , it is not a new code ( > 3 years ).
|
I am not sure we should test this config combination -not sure where we picked the chunk 1M size - is there a reference to something ? |
It was randomly picked by the test. |
Decodes (using build-id
|
1 MiB chunk sizes is stressing the allocator too much. We have two choices:
|
pushing out edge case |
Installation detailsKernel Version: 5.15.0-1017-aws Scylla Nodes used in this run:
OS / Image: Test: Issue descriptionAt Shortly afterwards, both nodes 2 and 6 aborted with codedumps:
Translated:
Coredump:
Translated:
Coredump:
Logs:
|
Reproduced it without the Installation detailsKernel Version: 5.15.0-1017-aws Scylla Nodes used in this run:
OS / Image: Test: Issue description>>>>>>> We alter a table:
Which leads the changes being processed and Reactor stalls being logged:
Then to a abort with coredump on 2 db nodes (attached below logs from db-node-1):
Coredump:
<<<<<<<
Logs:
|
Then it's something else (#2577). This issue is about
|
This patch adds some minimal tests for the "with compression = {..}" table configuration. These tests reproduce three known bugs: Refs scylladb#6442: Always print all schema parameters (including default values) Scylla doesn't return the default chunk_length_in_kb, but Cassandra does. Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Cassandra switched, long ago, the "sstable_compression" attribute's name to "class". This can break Cassandra applications that create tables (where we won't understand the "class" parameter) and applications that inquire about the configuration of existing tables. This patch adds tests for both problems. Refs scylladb#9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a core dump on all nodes Our test for this issue hangs Scylla (or crashes, depending on the test environment configuration), when a huge allocation is attempted during memtable flush. So this test is marked "skip" instead of xfail. The tests included here also uncovered a new minor/insignificant bug, where Scylla allows floating point numbers as chunk_length_in_kb - this number is truncated to an integer, and allowed, unlike Cassandra or common sense. Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Added a cql-pytest to reproduce this bug,
and this repeats indefinitely, with growing (up to 10s) retry delay. So this issue allows a DoS attack for authenticated users. Interestingly, this test shows that Cassandra also doesn't protect against huge chunk_length_in_kb. It doesn't seem to crash with a 1GB chunk size, but it's super-slow. |
Let's reject anything above 128k. |
The chunk size used in sstable compression can be set when creating a table, using the "chunk_length_in_kb" parameter. It can be any power-of-two multiple of 1KB. Very large compression chunks are not useful - they offer diminishing returns on compression ratio, and require very large memory buffers and reading a very large amount of disk data just to read a small row. In fact, small chunks are recommended - Scylla defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB (in Cassandra 3) to 16 KB (in Cassandra 4). Therefore, allowing arbitrarily large chunk sizes is just asking for trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang Scylla when it runs out of memory. So in this patch we add a hard limit of 128 KB for the chunk size - anything larger is refused. Fixes scylladb#9933 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The chunk size used in sstable compression can be set when creating a table, using the "chunk_length_in_kb" parameter. It can be any power-of-two multiple of 1KB. Very large compression chunks are not useful - they offer diminishing returns on compression ratio, and require very large memory buffers and reading a very large amount of disk data just to read a small row. In fact, small chunks are recommended - Scylla defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB (in Cassandra 3) to 16 KB (in Cassandra 4). Therefore, allowing arbitrarily large chunk sizes is just asking for trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang Scylla when it runs out of memory. So in this patch we add a hard limit of 128 KB for the chunk size - anything larger is refused. Fixes scylladb#9933 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds some minimal tests for the "with compression = {..}" table configuration. These tests reproduce three known bugs: Refs scylladb#6442: Always print all schema parameters (including default values) Scylla doesn't return the default chunk_length_in_kb, but Cassandra does. Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Cassandra switched, long ago, the "sstable_compression" attribute's name to "class". This can break Cassandra applications that create tables (where we won't understand the "class" parameter) and applications that inquire about the configuration of existing tables. This patch adds tests for both problems. Refs scylladb#9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a core dump on all nodes Our test for this issue hangs Scylla (or crashes, depending on the test environment configuration), when a huge allocation is attempted during memtable flush. So this test is marked "skip" instead of xfail. The tests included here also uncovered a new minor/insignificant bug, where Scylla allows floating point numbers as chunk_length_in_kb - this number is truncated to an integer, and allowed, unlike Cassandra or common sense. Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds some minimal tests for the "with compression = {..}" table configuration. These tests reproduce three known bugs: Refs #6442: Always print all schema parameters (including default values) Scylla doesn't return the default chunk_length_in_kb, but Cassandra does. Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Cassandra switched, long ago, the "sstable_compression" attribute's name to "class". This can break Cassandra applications that create tables (where we won't understand the "class" parameter) and applications that inquire about the configuration of existing tables. This patch adds tests for both problems. Refs #9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a core dump on all nodes Our test for this issue hangs Scylla (or crashes, depending on the test environment configuration), when a huge allocation is attempted during memtable flush. So this test is marked "skip" instead of xfail. The tests included here also uncovered a new minor/insignificant bug, where Scylla allows floating point numbers as chunk_length_in_kb - this number is truncated to an integer, and allowed, unlike Cassandra or common sense. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #14261
The chunk size used in sstable compression can be set when creating a table, using the "chunk_length_in_kb" parameter. It can be any power-of-two multiple of 1KB. Very large compression chunks are not useful - they offer diminishing returns on compression ratio, and require very large memory buffers and reading a very large amount of disk data just to read a small row. In fact, small chunks are recommended - Scylla defaults to 4 KB chunks, and Cassandra lowered their default from 64 KB (in Cassandra 3) to 16 KB (in Cassandra 4). Therefore, allowing arbitrarily large chunk sizes is just asking for trouble. Today, a user can ask for a 1 GB chunk size, and crash or hang Scylla when it runs out of memory. So in this patch we add a hard limit of 128 KB for the chunk size - anything larger is refused. Fixes scylladb#9933 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Not a regression or bugfix, not backporting. |
Installation details
Kernel version:
5.4.0-1035-aws
Scylla version (or git commit hash):
4.6.rc2-0.20220102.e8a1cfb6f with build-id 5d7b96e39c909424e8224207a162fc2c82b67214
Cluster size: 4 nodes (i3en.3xlarge)
Scylla running with shards number (live nodes):
longevity-large-partitions-4d-4-6-db-node-3bce3643-1 (13.51.176.199 | 10.0.1.153): 12 shards
longevity-large-partitions-4d-4-6-db-node-3bce3643-2 (13.53.118.234 | 10.0.2.180): 12 shards
longevity-large-partitions-4d-4-6-db-node-3bce3643-3 (13.49.69.157 | 10.0.2.208): 12 shards
longevity-large-partitions-4d-4-6-db-node-3bce3643-4 (13.49.80.203 | 10.0.2.19): 12 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI):
ami-04034a8a446f92efc
(aws: eu-north-1)Test:
longevity-large-partition-4days-test
Test name:
longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test config file(s):
Issue description
====================================
< t:2022-01-16 19:53:24,347 f:nemesis.py l:1352 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.SisyphusMonkey: _modify_table_property: ALTER TABLE scylla_bench.test WITH compression = {'sstable_compression': 'ZstdCompressor', 'chunk_length_kb': '1M', 'crc_check_chance': 0.042447059711723134};
core dump details:
====================================
Restore Monitor Stack command:
$ hydra investigate show-monitor 3bce3643-481e-40e1-9c3c-c4cf39e4bae8
Restore monitor on AWS instance using Jenkins job
Show all stored logs command:
$ hydra investigate show-logs 3bce3643-481e-40e1-9c3c-c4cf39e4bae8
Test id:
3bce3643-481e-40e1-9c3c-c4cf39e4bae8
Logs:
grafana - https://cloudius-jenkins-test.s3.amazonaws.com/3bce3643-481e-40e1-9c3c-c4cf39e4bae8/20220116_200857/grafana-screenshot-longevity-large-partition-4days-test-scylla-per-server-metrics-nemesis-20220116_201122-longevity-large-partitions-4d-4-6-monitor-node-3bce3643-1.png
grafana - https://cloudius-jenkins-test.s3.amazonaws.com/3bce3643-481e-40e1-9c3c-c4cf39e4bae8/20220116_200857/grafana-screenshot-overview-20220116_200857-longevity-large-partitions-4d-4-6-monitor-node-3bce3643-1.png
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/3bce3643-481e-40e1-9c3c-c4cf39e4bae8/20220116_201642/db-cluster-3bce3643.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/3bce3643-481e-40e1-9c3c-c4cf39e4bae8/20220116_201642/loader-set-3bce3643.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/3bce3643-481e-40e1-9c3c-c4cf39e4bae8/20220116_201642/monitor-set-3bce3643.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/3bce3643-481e-40e1-9c3c-c4cf39e4bae8/20220116_201642/sct-runner-3bce3643.tar.gz
Jenkins job URL
The text was updated successfully, but these errors were encountered: