Skip to content

Commit

Permalink
a hybrid feature/subsystem/resource grouping scheme for cassandra.yaml
Browse files Browse the repository at this point in the history
  • Loading branch information
maedhroz committed Feb 15, 2022
1 parent 598d608 commit 450b920
Showing 1 changed file with 386 additions and 0 deletions.
386 changes: 386 additions & 0 deletions conf/cassandra-grouped.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,386 @@
cluster:
name: 'Test Cluster'
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
tokens:
count: 16
initial_token: <token>
allocate_for_keyspace: <keyspace>
allocate_for_local_replication_factor: 3
snitch:
type: SimpleSnitch
dynamic:
update_interval: 100ms
reset_interval: 600000ms
badness_threshold: 1.0
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "127.0.0.1:7000"

security:
cache_warming_enabled: false
credentials:
authenticator: AllowAllAuthenticator
validity_period: 2000ms
update_interval: 2000ms
permissions:
authorizer: AllowAllAuthorizer
validity_period: 2000ms
update_interval: 2000ms
traverse_from_root: false
active_update_cache: false
read_consistency_level: LOCAL_QUORUM
write_consistency_level: EACH_QUORUM
roles:
manager: CassandraRoleManager
validity_period: 2000ms
update_interval: 2000ms
active_update_cache: false

storage:
sstables:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Apr 12, 2023

Author Owner

could just be sstable as well

directories:
- /var/lib/cassandra/data
local_system_data_file_directory: <directory>
flush_compression: fast
preemptive_open_interval: 50MiB
max_value_size: 256MiB
automatic_sstable_upgrade: false
max_concurrent_automatic_sstable_upgrades: 1
chunk_cache:
enabled: false
size: 512MiB
index_summary:
capacity: 50MiB
resize_interval: 60m
column_index:
size: 64KiB
cache_size: 2KiB
disk_failure_policy: stop
disk_optimization_strategy: ssd
trickle_fsync:
enabled: false
interval: 10240KiB
caching:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Apr 12, 2023

Author Owner

could just be cache

saved_caches_directory: /var/lib/cassandra/saved_caches
load_timeout: 30s
keys:
size: 100MiB
save_period: 4h
limit: 100
rows:
size: 0MiB
class: org.apache.cassandra.cache.OHCProvider
save_period: 0s
limit: 100
counters:
size: 50MiB
save_period: 7200s
limit: 100
memtable:
heap_space: 2048MiB
offheap_space: 2048MiB
cleanup_threshold: 0.11
allocation_type: heap_buffers
flush_writers: 2
transparent_data_encryption:
enabled: false
chunk_length_kb: 64
cipher: AES/CBC/PKCS5Padding
key_alias: testing:1
key_provider:
- class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
- keystore: conf/.keystore
keystore_password: cassandra
store_type: JCEKS
key_password: cassandra

commitlog:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 15, 2022

Author Owner

could nest under storage

directory: /var/lib/cassandra/commitlog
commit_failure_policy: stop
sync: periodic
sync_period: 10000ms
periodic_sync_lag_block: 10ms
sync_group_window: 1000ms
segment_size: 32MiB
total_space: 8192MiB
compression:
- class_name: LZ4Compressor
parameters:
- <param>

cdc:
enabled: false
raw_directory: /var/lib/cassandra/cdc_raw
total_space: 4096MiB
free_space_check_interval: 250ms

backup:
incremental: false
snapshot_before_compaction: false
auto_snapshot: true
snapshot_links_per_second: 0

compaction:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 15, 2022

Author Owner

Could nest in storage

concurrent_compactors: 1
limits:
throughput: 64MiB/s
thresholds:
large_partition_warning: 100MiB
tombstone_warning: 100000

network:
buffer_pool_size: 128MiB
internode:
address: localhost
interface: eth0
port: 7000
prefer_ipv6: false
broadcast_address: 1.2.3.4
listen_on_broadcast_address: false
authorizer: AllowAllNetworkAuthorizer
authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
phi_convict_threshold: 8
compression: dc
inter_dc_tcp_nodelay: false
limits:
socket_send_buffer_size: 128KiB
socket_receive_buffer_size: 128KiB
application_send_queue_capacity: 4MiB
application_send_queue_reserve_endpoint_capacity: 128MiB
application_send_queue_reserve_global_capacity: 512MiB
application_receive_queue_capacity: 4MiB
application_receive_queue_reserve_endpoint_capacity: 128MiB
application_receive_queue_reserve_global_capacity: 512MiB
internode_timeout_enabled: true
timeouts:
tcp_connect: 2000ms
tcp_user: 30000ms
encryption:
type: none
legacy_ssl_storage_port_enabled: false
keystore: conf/.keystore
keystore_password: cassandra
require_client_auth: false
truststore: conf/.truststore
truststore_password: cassandra
require_endpoint_verification: false
client:
enabled: true
port: 9042
ssl_port: 9142
address: localhost
broadcast_address: 1.2.3.4
interface: eth1
prefer_ipv6: false
keepalive: true
allow_older_protocols: true
flush_in_batches_legacy: false
limits:
frame_size: 16MiB
connections: -1
connections_per_ip: -1
request_handling_threads: 128
timeouts:
idle: 60000ms
encryption:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
require_client_auth: false
error_reporting_exclusions:
subnets:
- 127.0.0.1
- 127.0.0.0/31

hinted_handoff:
enabled: true
disabled_datacenters:
- DC1
- DC2
max_hint_window: 3h
hint_window_persistent_enabled: true
handoff_throttle: 1024KiB
delivery_threads: 2
hint_directory: /var/lib/cassandra/hints
flush_period: 10000ms
max_hints_file_size: 128MiB
max_hints_size_per_host: 0MiB

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 22, 2022

Author Owner

These two maximums could probably nest in a limits: element.

auto_cleanup_enabled: false
compression:
- class_name: LZ4Compressor
parameters:
- <param>

repair:
session_space: 2GiB
concurrent_validations: 0
tracking:
track_for_range_reads_enabled: false
track_for_partition_reads_enabled: false
report_unconfirmed_mismatches: false

streaming:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 15, 2022

Author Owner

Not nested w/ repair, b/c bootstrap, but could conceivably nest in network -> internode.

limits:
connections_per_host: 1
throughput:
outbound: 24MiB/s
inter_dc_outbound: 24MiB/s
timeouts:
tcp_user: 300000ms

This comment has been minimized.

Copy link
@belliottsmith

belliottsmith Feb 17, 2022

convert to minutes?

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 17, 2022

Author Owner

Absolutely. I'm sure there are some other nonsensical unit choices in this draft. (ex. requests.truncate.timeout)

keep_alive: 300s
entire_sstables:
enabled: true
limits:
throughput:
outbound: 24MiB/s
inter_dc_outbound: 24MiB/s

denylist:
partitions_enabled: false
writes_enabled: true
reads_enabled: true
range_reads_enabled: true
refresh_interval: 600s
initial_load_retry: 5s
consistency_level: QUORUM
limits:
keys_per_table: 1000
keys_total: 10000

requests:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 22, 2022

Author Owner

@pauloricardomg @dcapwell I forgot native-transport request rate-limiting here, but it's probably where I would have put it (probably in a limits element).

default_timeout: 10000ms
slow_query_log_timeout: 500ms
read:
threads: 32
timeouts:
partition: 5000ms
range: 10000ms
thresholds:
tombstones:
warn: 1000
fail: 100000
coordinator_read_size:
warn: 0KiB
fail: 0KiB
local_read_size:
warn: 0KiB
fail: 0KiB
row_index_size:
warn: 0KiB
fail: 0KiB
fetch_size:
warn: 10000
fail: 100000
write:
ideal_consistency_level: EACH_QUORUM
threads: 32
timeout: 2000ms
cas_contention_timeout: 1000ms
counter_write:
threads: 32
timeout: 5000ms
truncate:
timeout: 60000ms

cql:
prepared_statements_cache_size: 10MiB
user_timestamps_enabled: true
read_before_write_list_operations_enabled: true

tracing:
query_ttl: 1d
repair_ttl: 7d

replica_filtering_protection:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 15, 2022

Author Owner

This could be an issue for both range reads and index queries (which use the range read mechanism). Not sure where a good final resting place would be.

cached_rows_warn_threshold: 2000
cached_rows_fail_threshold: 32000

batches:
batchlog_replay_throttle: 1024KiB
thresholds:
batch_size:
warn: 5KiB
fail: 50KiB
unlogged_batch_across_partitions:
warn: 10

logging:
gc:
info: 200ms
warn: 1000ms
audit:
enabled: false
logger:
- class_name: BinAuditLogger
audit_logs_dir: <directory>
included_keyspaces: <list>
excluded_keyspaces: system, system_schema, system_virtual_schema
included_categories: <list>
excluded_categories: <list>
included_users: <list>
excluded_users: <list>
roll_cycle: HOURLY
block: true
max_queue_weight: 256MiB
max_log_size: 16GiB
fql:
log_dir:
roll_cycle: HOURLY
block: true
max_queue_weight: 256MiB
max_log_size: 16GiB

schema:
default_replication_factor: 2
minimum_replication_factor: 1
table_properties:
ignored: []
disallowed: []
thresholds:

This comment has been minimized.

Copy link
@belliottsmith

belliottsmith Feb 16, 2022

ideally need to pick limits or thresholds throughout

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 16, 2022

Author Owner

I went back and forth on that a lot. My argument for having both is that they are different enough concepts.

limit = the maximum amount of some resource you can use (ex. the number of keys you can put in a cache...nothing happens if you try to use more...entries are just evicted)

threshold = a point at which we'll take some action (ex. warn and then fail at different numbers of tombstones read)

If having both does more harm than good for readability, glad to change though.

This comment has been minimized.

Copy link
@belliottsmith

belliottsmith Feb 16, 2022

I'm not sure if it's clearer this way, but if there's a clear pattern (which it seems so) then I don't have a major complaint.

tables:
warn: 150
fail: 300
keyspaces:
warn: 20
fail: 40
columns_per_table:
warn: 50
fail: 200
secondary_indexes_per_table:
warn: 1
fail: 3
materialized_views_per_table:
warn: -1
fail: -1

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 24, 2022

Author Owner

Alternate version of the schema element I've been toying around with:

schema:
  keyspace:
    limits:
      replication_factor:
        min: 1
        max: 4
    thresholds:
      keyspace_count:
        warn: 20
        fail: 40
  table:
    thresholds:
      table_count:
        warn: 150
        fail: 300
      column_count:
        warn: 50
        fail: 200
    properties:
      ignored: []
      disallowed: []```

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 24, 2022

Author Owner

Organizes around keyspace and table, and leaves our MVs and indexes.


user_defined_functions:

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 15, 2022

Author Owner

could nest under schema

This comment has been minimized.

Copy link
@adelapena

adelapena Feb 17, 2022

Agree, there could sections under schema for user_defined_functions, materialized_views, user_defined_types and secondary_indexes, with the later maybe organized per implementation. That way we could place there not only enable/disable flags but thresholds for number of fields per UDT, number of indexed columns, etc.

This comment has been minimized.

Copy link
@belliottsmith

belliottsmith Feb 17, 2022

I'm less convinced by this. Schema is too loose a concept, and while enabling/disabling these features might be a schema or cql topic, the limits etc are not.

Perhaps cql_features?

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 17, 2022

Author Owner

Hmmm, on second thought, cql might be a better choice.

This comment has been minimized.

Copy link
@adelapena

adelapena Feb 17, 2022

I'm not sure 2i or MVs belong to CQL. They would probably survive is CQL is replaced by something else, and I think 2i were there before CQL. Independently of where we nest them, it seems more intuitive to group their enable/disable flags and safety thresholds.

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 17, 2022

Author Owner

Independently of where we nest them, it seems more intuitive to group their enable/disable flags and safety thresholds.

I'm fine w/ that, especially for MVs

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 17, 2022

Author Owner

I'm not entirely sure what we want to do w/ the secondary indexing bits, to be honest. SASI and legacy 2i eventually need to die, and even SAI probably won't have a ton of knobs to worry about in this file. I've argued w/ myself about having something like an indexing element at the top level, and then nesting things there (ex. indexing -> sai -> limits -> indexes_per_table), but we can perhaps delay concrete planning for that until we know what those parameters are going to be.

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 24, 2022

Author Owner
enabled: false
scripted_functions_enabled: false

materialized_views:
enabled: false
limits:
builders: 1
concurrent_writes: 32

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 15, 2022

Author Owner

For these las few bits, I either really didn't know where they should go...or they dealt w/ features like SASI that are probably dead anyway.

This comment has been minimized.

Copy link
@ekaterinadimitrova2

ekaterinadimitrova2 Feb 16, 2022

Quick question, all comments were removed just to make the structure clear while looking at it or you really plan to remove any kind of explanation/comments from the yaml?

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 16, 2022

Author Owner

Ah, sorry about that. I just removed the comments to make it easier to focus on the new structure. You can start to see, however, where the inline documentation will benefit though. In some places in the existing YAML, we have multiple options relating to a feature, but then try to summarize the entire feature loosely on top of its first option. This will be mostly unnecessary in the new structure, since we can attach that summary to the feature's/subsystem's root element.

# validate tombstones on reads and compaction
# can be either "disabled", "warn" or "exception"
corrupted_tombstone_strategy: disabled

# If enabled, diagnostic events can be helpful for troubleshooting operational issues. Emitted events contain details
# on internal state and temporal relationships across events, accessible by clients via JMX.
diagnostic_events_enabled: false

This comment has been minimized.

Copy link
@pauloricardomg

pauloricardomg Feb 21, 2022

I was wondering if it would make sense to group (experimental) feature flags together. Something along those lines:

features:
    # Diagnostic events can be helpful for troubleshooting operational issues. Emitted events contain details
    # on internal state and temporal relationships across events, accessible by clients via JMX.
    diagnostics_events:
       enabled: false 
    # The features listed below are considered experimental and are not recommended for production use.              
    transient_replication:
        # Enables creation of transiently replicated keyspaces on this node.           
        enabled: false
        transient_prop123: "foobar"
    sasi:
        # Enables SASI index creation on this node.
        enabled: false
    materialized_views:
        # Enables creation of MVs on this node.    
        enabled: false
        mv_prop1: 100ms
    drop_compact_storage:
        # Enables the used of 'ALTER ... DROP COMPACT STORAGE' statements on this node.
        enabled: false

This comment has been minimized.

Copy link
@maedhroz

maedhroz Feb 21, 2022

Author Owner

Especially when it comes to currently experimental features (SASI), features with little configuration other than an "enabled" flag (most of these), and items that will likely be removed in future releases (drop CS) I would be perfectly fine with leaving them out of the larger project of grouping/nesting. One slightly less invasive alternative is just retaining a commented section at the end of the config w/ options relating to experimental features.


# Enables SASI index creation on this node.
# SASI indexes are considered experimental and are not recommended for production use.
sasi_indexes_enabled: false

# Enables creation of transiently replicated keyspaces on this node.
# Transient replication is experimental and is not recommended for production use.
transient_replication_enabled: false

# Enables the used of 'ALTER ... DROP COMPACT STORAGE' statements on this node.
# 'ALTER ... DROP COMPACT STORAGE' is considered experimental and is not recommended for production use.
drop_compact_storage_enabled: false

0 comments on commit 450b920

Please sign in to comment.