4.0 features that K8ssandra should support #565

adejanovski · 2021-03-19T15:53:28Z

adejanovski
Mar 19, 2021
Maintainer

Enhanced token allocation algorithm

16 vnodes by default.
allocate_tokens_for_replication_factor yaml setting, in replacement of the much harder to use allocate_tokens_for_keyspace setting.

Incremental repair fix

Check my blog post for more information on the improvements.
Add Reaper auto-scheduling based on % repaired, using incremental repair.
This could allow to improve tombstone purges safely by:

Automatically and quickly repairing small chunks of data
Allowing the purge of tombstones that have been repaired only (cassandra.yaml setting)
Which allows to reduce gc_grace_seconds to the hint window without fearing zombie data

We should also allow enabling aggressive purging of fully expired sstables in TWCS (requires a specific JVM flag and to allow enabling through a compaction subproperty on individual tables).

Compaction improvements

Higher compaction throughput by default. Maybe we don’t need to care about this unless we don’t use the new defaults seamlessly.

JDK11 support with Shenandoah & ZGC

Benchmarks comparing 3.11 vs 4.0~alpha4 showed tremendous improvement in throughput and especially latencies (-88% at p99 when using Shenandoah).

Shenandoah is available in JDK11 using specific builds:

Releases
Shenandoah availability differs by vendor and JDK release. OpenJDK 12+ builds normally include Shenandoah by default. OpenJDK 11 requires the opt-in during build time.

Known vendor status is:

Red Hat
Fedora 24+ OpenJDK 8+ builds include Shenandoah

RHEL 7.4+ ships with OpenJDK 8+ that includes Shenandoah as Technology Preview
Red Hat OpenJDK 8u builds for Windows include Shenandoah
Amazon
Ships Shenandoah in Amazon Corretto, starting with OpenJDK 11.0.9
Oracle
Does not ship Shenandoah in any release, both OpenJDK builds and proprietary builds
Azul
Ships Shenandoah in Azul Zulu, starting with OpenJDK 11.0.9
SAP
(status unknown)
AdoptOpenJDK
Ships Shenandoah in default binaries, starting with OpenJDK 11.0.9
Linux distributions
Debian ships Shenandoah starting with OpenJDK 11.0.9
Gentoo ebuilds for IcedTea have Shenandoah USE flag
RHEL/Fedora-based distros or other distros that use packages from them may also have Shenandoah enabled. Notably, CentOS, Oracle Linux, and Amazon Linux are known to ship it.

I assume this would require us to create custom Apache Cassandra docker images to use with alternate JDKs.
Performance wise this is something that should be seriously considered though.

Client Backpressure

To avoid having nodes crashing to OOM, client backpressure was implemented in 4.0.
This blog post describes the issue and how 4.0 solves it.

On the server side, this feature exposes two new thresholds in cassandra.yaml:

native_transport_max_concurrent_requests_in_bytes_per_ip
native_transport_max_concurrent_requests_in_bytes

Not sure for now how these should be configured, but exposing them would be a good thing.

Audit + Full Query Logging

Audit logging is a nice feature to comply with some regulations such as SOX, and log every interaction between clients and clusters (connection attempts, failed/successful queries, etc…).

Full query logging allows to dump all queries going through a node to a binary file, which can be used for replay or analysis.

jdonenine · 2021-03-20T13:16:22Z

jdonenine
Mar 20, 2021

Related to

JDK11 support with Shenandoah & ZGC

I assume this would require us to create custom Apache Cassandra docker images to use with alternate JDKs.
Performance-wise this is something that should be seriously considered though.

This brings up the question of even if those images existed, how would people use them given the version mapping that happens between k8ssandra, cass-operator, and the management API. If k8ssandra is intended to be opinionated, should we perhaps be evaluating the options available and choosing what we believe to be the best overall performing image?

5 replies

jdonenine Mar 20, 2021

I think what I'd like to see avoided if possible is an explosion of the version matrix available, where we'd have specific combinations like version: 4.0.0-shenandoah specified, perhaps this is where a user would need to override the management-api version used if they don't like the default we selected to be the best.

adejanovski Mar 22, 2021
Maintainer Author

Luckily we can make that simple by using a docker image that ships with a jdk containing all the garbage collectors. The shipilev builds of jdk11 should have them all (CMS, G1, ZGC and Shenandoah).

JeremiahDJordan Mar 22, 2021

The shipilev builds of jdk11 should have them all (CMS, G1, ZGC and Shenandoah).

I think most of the openjdk 11 builds include it by default now.

JeremiahDJordan Mar 22, 2021

I think the only choice k8ssandra would need is jdk8 vs jdk11.

jsanda Mar 22, 2021
Maintainer

I think what I'd like to see avoided if possible is an explosion of the version matrix available

Agreed. I think we need to document the process for building custom images. Then if someone wants to use a different Java version and/or runtime other than what k8ssandra supports, we can provide the necessary info to build custom images.

jdonenine · 2021-03-20T13:18:29Z

jdonenine
Mar 20, 2021

Related to

Enhanced token allocation algorithm

16 vnodes by default.
allocate_tokens_for_replication_factor yaml setting, in replacement of the much harder to use allocate_tokens_for_keyspace setting.

This is the 4.0 version of this issue we already have documented for 3.11? #490

6 replies

jsanda Mar 22, 2021
Maintainer

#490 requires us to generate different cassandra.yaml files for seed nodes and non seed nodes.

Se the comments in #324. @JeremiahDJordan says it is not necessary to generate tokens for seed nodes.

adejanovski Mar 22, 2021
Maintainer Author

Sure, I know we need to let seed nodes generate random tokens.
But it means seed nodes shouldn't have the allocate_tokens_for_keyspace setting in their yaml, while seed nodes will need it, hence my comment about the different configuration for both types of nodes. Unless I missed something in Jeremiah's comments?

That brings a question though on the token ownership in small clusters as random generation of 16 vnodes for 3 seed nodes will require adding a few nodes to get proper balance. I remember we estimated it required at least 6 nodes to get a correct balance. It means that clusters between 3 and 5 nodes could have some ownership imbalance (which won't be a problem in 4.0 btw).

JeremiahDJordan Mar 22, 2021

As far as I know allocate_tokens_for_replication should just "do the right thing" when it is left enabled on all nodes, seeds included.

adejanovski Mar 22, 2021
Maintainer Author

Yes it will, but we're discussing the use of allocate_tokens_for_keyspace in 3.11 here, which won't work for seed nodes :)

adejanovski Mar 22, 2021
Maintainer Author

Which is a little out of topic for the current GH discussion, but the point was raised by Jeff.

jsanda · 2021-03-22T13:31:39Z

jsanda
Mar 22, 2021
Maintainer

Several of the things discussed here that involve new configuration settings will likely require changes to cass-config-builder and cass-config-definitions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.0 features that K8ssandra should support #565

{{title}}

Replies: 3 comments 11 replies

{{title}}

JDK11 support with Shenandoah & ZGC

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Enhanced token allocation algorithm

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

4.0 features that K8ssandra should support #565

adejanovski Mar 19, 2021 Maintainer

Enhanced token allocation algorithm

Incremental repair fix

Compaction improvements

JDK11 support with Shenandoah & ZGC

Client Backpressure

Audit + Full Query Logging

Replies: 3 comments · 11 replies

jdonenine Mar 20, 2021

JDK11 support with Shenandoah & ZGC

jdonenine Mar 20, 2021

adejanovski Mar 22, 2021 Maintainer Author

JeremiahDJordan Mar 22, 2021

JeremiahDJordan Mar 22, 2021

jsanda Mar 22, 2021 Maintainer

jdonenine Mar 20, 2021

Enhanced token allocation algorithm

jsanda Mar 22, 2021 Maintainer

adejanovski Mar 22, 2021 Maintainer Author

JeremiahDJordan Mar 22, 2021

adejanovski Mar 22, 2021 Maintainer Author

adejanovski Mar 22, 2021 Maintainer Author

jsanda Mar 22, 2021 Maintainer

adejanovski
Mar 19, 2021
Maintainer

Replies: 3 comments 11 replies

jdonenine
Mar 20, 2021

adejanovski Mar 22, 2021
Maintainer Author

jsanda Mar 22, 2021
Maintainer

jdonenine
Mar 20, 2021

jsanda Mar 22, 2021
Maintainer

adejanovski Mar 22, 2021
Maintainer Author

adejanovski Mar 22, 2021
Maintainer Author

adejanovski Mar 22, 2021
Maintainer Author

jsanda
Mar 22, 2021
Maintainer