Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default #8948

Open
1 task done
agalue opened this issue Jun 29, 2021 · 8 comments

Comments

@agalue
Copy link

agalue commented Jun 29, 2021

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

  • I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): 4.4.3-0.20210609.bfafb8456)
Cluster size: 1 (test environment)
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS

Hardware details (for performance issues) Delete if unneeded
Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores= hyperthreading= memory=
Disks: (SSD/HDD, count)

I was trying to migrate to Scylla on a test environment by letting it use the same data directory as Cassandra left with all the data (i.e., /var/lib/cassandra) to see if that works.

After doing that, Scylla refused to start because of the following error:

Jun 28 15:19:59 scylla-apex1 scylla[33784]:  [shard 0] init - Startup failed: exceptions::configuration_exception (Unknown compression option 'class'.)

I found that Cassandra deprecated sstable_compression (and other sibling fields), and it now uses class, as described here. Interestingly, Cassandra allows both, but complains when sstable_compression is used with a warning on the logs.

I'm wondering if there is a way to fix this in Scylla, to allow it to use the same Cassandra Data directory without the need of using the sstableloader tool to migrate the data and reduce the downtime.

@agalue agalue changed the title Cassandra uses "class" instead of Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Jun 29, 2021
avikivity added a commit to avikivity/scylladb that referenced this issue Jun 29, 2021
…sstable_compression'

Cassandra 3.0 deprecated the 'sstable_compression' attribute and added
'class' as a replacement. Follow by supporting both.

The SSTABLE_COMPRESSION variable is renamed to SSTABLE_COMPRESSION_DEPRECATED
to detect all uses and prevent future misuse.

Existing unit tests are modified to use the new name, and a test
is added to ensure the old name is still supported.

Fixes scylladb#8948.
@slivne slivne added this to the 4.7 milestone Jul 6, 2021
@slivne
Copy link
Contributor

slivne commented Jul 6, 2021

@avikivity if you want me to reassign we can

avikivity added a commit to avikivity/scylladb that referenced this issue Jul 7, 2021
…sstable_compression'

Cassandra 3.0 deprecated the 'sstable_compression' attribute and added
'class' as a replacement. Follow by supporting both.

The SSTABLE_COMPRESSION variable is renamed to SSTABLE_COMPRESSION_DEPRECATED
to detect all uses and prevent future misuse.

To prevent old-version nodes from seeing the new name, the
compression_parameters class preserves the key name when it is
constructed from an options map, and emits the same key name when
asked to generate an options map.

Existing unit tests are modified to use the new name, and a test
is added to ensure the old name is still supported.

Fixes scylladb#8948.
avikivity added a commit that referenced this issue Jul 28, 2021
…recate 'sstable_compression'"

This reverts commit 5571ef0. It causes
rolling upgrade failures.

Fixes #9055.

Reopens #8948.
@avikivity avikivity reopened this Jul 28, 2021
@avikivity
Copy link
Member

Reopened by 331eb57

lauranovich pushed a commit to lauranovich/scylla that referenced this issue Jul 29, 2021
…sstable_compression'

Cassandra 3.0 deprecated the 'sstable_compression' attribute and added
'class' as a replacement. Follow by supporting both.

The SSTABLE_COMPRESSION variable is renamed to SSTABLE_COMPRESSION_DEPRECATED
to detect all uses and prevent future misuse.

To prevent old-version nodes from seeing the new name, the
compression_parameters class preserves the key name when it is
constructed from an options map, and emits the same key name when
asked to generate an options map.

Existing unit tests are modified to use the new name, and a test
is added to ensure the old name is still supported.

Fixes scylladb#8948.

Closes scylladb#8949
@nyh
Copy link
Contributor

nyh commented Feb 6, 2022

This difference is reproduced by the translated Cassandra unit test cassandra_tests/validation/operations/alter_test.py::testAlterTableWithCompression:

Whereas the default setting for Cassandra is {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}, in Scylla it is {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} so the first check in that test fails.

@tzach
Copy link
Contributor

tzach commented Feb 6, 2022

See more schema incompatibles here #9859

denesb pushed a commit that referenced this issue Feb 7, 2022
This is a translation of Cassandra's CQL unit test source file
validation/operations/AlterTest.java into our our cql-pytest framework.

This test file includes 24 tests for various types of ALTER operations
(of keyspaces, tables and types). Two additional tests which required
multiple data centers to test were dropped with a comment explaining why.

All 24 tests pass on Cassandra, with 8 failing on Scylla reproducing
one already known Scylla issue and 5 previously-unknown ones:

  Refs #8948:  Cassandra 3.11.10 uses "class" instead of
               "sstable_compression" for compression settings by default
  Refs #9929:  Cassandra added "USING TIMESTAMP" to "ALTER TABLE",
               we didn't.
  Refs #9930:  Forbid re-adding static columns as regular and vice versa
  Refs #9935:  Scylla stores un-expanded compaction class name in system
               tables.
  Refs #10036: Reject empty options while altering a keyspace
  Refs #10037: If there are multiple values for a key, CQL silently
               chooses last value

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220206163820.1875410-2-nyh@scylladb.com>
@slivne slivne modified the milestones: 5.0, 5.2 Apr 14, 2022
@jmaixl
Copy link

jmaixl commented May 16, 2022

@agalue if you migrate just the sstables for keyspaces, does that work?
I'm able to migrate specific keyspaces' sstables (so everything but system, system_auth, system_distributed, system_traces, hints).

@agalue
Copy link
Author

agalue commented May 25, 2022

I apologize for the delay in my response.

That could work, but what I wanted is precisely avoid the need to migrate SSTables manually.

I wanted to see if it is possible to stop and uninstall Cassandra, install ScyllaDB, point it to the Cassandra folder where the SSTables live, start ScyllaDB, and have it working, which is what triggered the problem I reported. That would minimize the downtime without the need for a migration, which is difficult considering the constraints we have.

If that's not possible, and the only way to have it working is to migrate the SSTables before starting using Scylla (as dual-writes are not an option for us), I'll let my team know.

@nyh
Copy link
Contributor

nyh commented Jun 15, 2023

Note that this issue is not just about different printing of the schema - it's also about a valid CREATE TABLE command from Cassandra (e.g., create table ... with compression = { 'class': 'LZ4Compressor' }) not working in Scylla. This can break applications who try to create such tables.

I Added two cql-pytest tests to reproduce these two aspects of this issue:

  • test_sstable_compression.py::test_compression_class is about a Cassandra application attempting to use the "class" parameter in a CREATE TABLE .. WITH COMPRESSION and failing.
  • test_sstable_compression.py::test_read_compression_class is about an application reading the schema of an existing table, hoping to find a "class" there (as is the case in modern Cassandra).

We have an old PR fixing this from @avikivity - #8949 - but as explained above it was reverted because of upgrade issues. Can we fix those and re-apply a new version of this fix?

nyh added a commit to nyh/scylla that referenced this issue Jun 15, 2023
This patch adds some minimal tests for the "with compression = {..}" table
configuration. These tests reproduce three known bugs:

Refs scylladb#6442: Always print all schema parameters (including default values)

  Scylla doesn't return the default chunk_length_in_kb, but Cassandra
  does.

Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

  Cassandra switched, long ago, the "sstable_compression" attribute's
  name to "class". This can break Cassandra applications that create
  tables (where we won't understand the "class" parameter) and applications
  that inquire about the configuration of existing tables. This patch adds
  tests for both problems.

Refs scylladb#9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a
            core dump on all nodes

  Our test for this issue hangs Scylla (or crashes, depending on the test
  environment configuration), when a huge allocation is attempted during
  memtable flush. So this test is marked "skip" instead of xfail.

The tests included here also uncovered a new minor/insignificant bug,
where Scylla allows floating point numbers as chunk_length_in_kb - this
number is truncated to an integer, and allowed, unlike Cassandra or
common sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Jun 19, 2023
This patch adds some minimal tests for the "with compression = {..}" table
configuration. These tests reproduce three known bugs:

Refs scylladb#6442: Always print all schema parameters (including default values)

  Scylla doesn't return the default chunk_length_in_kb, but Cassandra
  does.

Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

  Cassandra switched, long ago, the "sstable_compression" attribute's
  name to "class". This can break Cassandra applications that create
  tables (where we won't understand the "class" parameter) and applications
  that inquire about the configuration of existing tables. This patch adds
  tests for both problems.

Refs scylladb#9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a
            core dump on all nodes

  Our test for this issue hangs Scylla (or crashes, depending on the test
  environment configuration), when a huge allocation is attempted during
  memtable flush. So this test is marked "skip" instead of xfail.

The tests included here also uncovered a new minor/insignificant bug,
where Scylla allows floating point numbers as chunk_length_in_kb - this
number is truncated to an integer, and allowed, unlike Cassandra or
common sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Jun 20, 2023
This patch adds some minimal tests for the "with compression = {..}" table
configuration. These tests reproduce three known bugs:

Refs #6442: Always print all schema parameters (including default values)

  Scylla doesn't return the default chunk_length_in_kb, but Cassandra
  does.

Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

  Cassandra switched, long ago, the "sstable_compression" attribute's
  name to "class". This can break Cassandra applications that create
  tables (where we won't understand the "class" parameter) and applications
  that inquire about the configuration of existing tables. This patch adds
  tests for both problems.

Refs #9933: ALTER TABLE with "chunk_length_kb" (compression) of 1MB caused a
            core dump on all nodes

  Our test for this issue hangs Scylla (or crashes, depending on the test
  environment configuration), when a huge allocation is attempted during
  memtable flush. So this test is marked "skip" instead of xfail.

The tests included here also uncovered a new minor/insignificant bug,
where Scylla allows floating point numbers as chunk_length_in_kb - this
number is truncated to an integer, and allowed, unlike Cassandra or
common sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14261
@nyh
Copy link
Contributor

nyh commented Oct 18, 2023

This issue is a duplicate of #2061. One of them should be closed. The other issue is older, but this one has better explanations and also refers to a reverted patch.

nyh added a commit to nyh/scylla that referenced this issue Oct 22, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs scylladb#6442: Always print all schema parameters (including default values)
Refs scylladb#8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs scylladb#8892: Add an option for default RF for new keyspaces.
Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs scylladb#2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Oct 23, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs scylladb#6442: Always print all schema parameters (including default values)
Refs scylladb#8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs scylladb#8892: Add an option for default RF for new keyspaces.
Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs scylladb#2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Nov 6, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs scylladb#6442: Always print all schema parameters (including default values)
Refs scylladb#8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs scylladb#8892: Add an option for default RF for new keyspaces.
Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs scylladb#2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Nov 7, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs scylladb#6442: Always print all schema parameters (including default values)
Refs scylladb#8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs scylladb#8892: Add an option for default RF for new keyspaces.
Refs scylladb#8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs scylladb#2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Nov 8, 2023
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.

The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:

Refs #6442: Always print all schema parameters (including default values)
Refs #8001: Documented unit "µs" not supported for assigning a duration"
            type.
Refs #8892: Add an option for default RF for new keyspaces.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
            for compression settings by default

Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:

1. Tests for CREATE TRIGGER were commented out because testing them
   in Cassandra requires adding a Java class for the test. We're also
   not likely to ever add this feature to Scylla (Refs #2205).

2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
   used internal Java APIs instead of CQL, and it also unlikely
   we'll ever implement it in a way compatible with Cassandra because
   of its Java reliance.

3. One test for data center names used internal Cassandra Java APIs, not
   CQL to create mock data centers and snitches.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15791
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment