-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(test-cases): Add test cases for low and asymmetric loads #6517
feature(test-cases): Add test cases for low and asymmetric loads #6517
Conversation
Some of the jobs failed due to Manager Restore nemesis failing:
Pretty sure that's a known issue, but unrelated to this PR |
5ec36bc
to
c45f6d9
Compare
@@ -0,0 +1,30 @@ | |||
test_duration: 330 | |||
|
|||
prepare_write_cmd: "cassandra-stress write cl=QUORUM n=2097152 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=1) compaction(strategy=SizeTieredCompactionStrategy)' -mode cql3 native -rate threads=40 -pop seq=1..2097152 -col 'n=FIXED(10) size=FIXED(512)' -log interval=5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why rf=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roy requested a longevity that behaves like this, RF=1, CL=1, it's to cover extremely low loads during repairs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to duplicate the entire yaml file in order to add one or two parameters and In this case you don't even need to add a new configuration because under configurations you already have one called "db_nodes_shards_selection.yaml".
So, the only thing you need is to have a new pipeline and add this configuration.
backend: 'aws', | ||
region: 'eu-west-1', | ||
test_name: 'longevity_test.LongevityTest.test_custom_time', | ||
test_config: 'test-cases/longevity/longevity-200GB-48h-verifier-LimitedMonkey-tls-asymmetric.yaml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the comment below, here you just need to have the original test_config + "configurations/db_nodes_shards_selection.yaml".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
prepare_write_cmd: "cassandra-stress write cl=ALL n=200200300 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1000 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=15" | ||
prepare_write_cmd: | ||
- "cassandra-stress write cl=ALL n=200200300 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1000 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=15" | ||
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=50 -col 'size=FIXED(1024) n=FIXED(4)' -pop seq=1..500 -log interval=15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=50 -col 'size=FIXED(1024) n=FIXED(4)' -pop seq=1..500 -log interval=15" | |
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rate threads is what makes it "low load".
I also changed the size of the columns so it will be a tiny table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test-cases/longevity/longevity-200GB-48h-verifier-LimitedMonkey-tls.yaml
Show resolved
Hide resolved
# prepare_verify_cmd: "cassandra-stress read cl=ALL n=200200300 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=2000 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=15" | ||
|
||
stress_cmd: ["cassandra-stress write cl=QUORUM duration=2860m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=250 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=400200300..600200300 -log interval=15"] | ||
stress_read_cmd: ["cassandra-stress read cl=ONE duration=2860m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=250 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=5"] | ||
stress_read_cmd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're changing, it's a good opportunity to remove the "Stress_read_cmd" and move it under stress_cmd.
IIRC it's just start few minutes later, but it doesn't work with round_robin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stress_read_cmd: ["cassandra-stress read cl=ONE duration=2860m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=250 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=5"] | ||
stress_read_cmd: | ||
- "cassandra-stress read cl=ONE duration=2860m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=250 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=5" | ||
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=25 -col 'size=FIXED(1024) n=FIXED(4)' -pop seq=1..500 -log interval=5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=25 -col 'size=FIXED(1024) n=FIXED(4)' -pop seq=1..500 -log interval=5" | |
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The low load in the name here is a mistake.
The purpose of this longevity is RF=1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
backend: 'aws', | ||
region: 'eu-west-1', | ||
test_name: 'longevity_test.LongevityTest.test_custom_time', | ||
test_config: 'test-cases/longevity/longevity-10gb-4h-low-load.yaml', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I prefer you use the 200gb-48h as the baseline (with test duration we can run it shorter than 48h).
Also here, instead of duplicating the entire yaml, you can add a configuration that overrides only the prepare_cmd to be "replication_factor=1").
(If it's mentioned in the other commands it's irrelevant anyway).
And the nemesis class to be "NonDisruptive".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This longevity requires extensive testing with random nemesis_seed to see that it actually works as expected for all nemesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will start running it today with different nemesis seeds
c45f6d9
to
51fa7a4
Compare
Jobs, again, this time with fixes. |
51fa7a4
to
b990bf6
Compare
prepare_write_cmd: "cassandra-stress write cl=ALL n=200200300 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1000 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=15" | ||
prepare_write_cmd: | ||
- "cassandra-stress write cl=ALL n=200200300 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1000 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=15" | ||
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=15" | |
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stress_cmd: | ||
- "cassandra-stress write cl=QUORUM duration=2860m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=250 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=400200300..600200300 -log interval=15" | ||
- "cassandra-stress read cl=ONE duration=2860m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=250 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..200200300 -log interval=5" | ||
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=5" | |
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
configurations/rf1-low-load.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the low-load from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
configurations/rf1-low-load.yaml
Outdated
- "cassandra-stress write cl=ALL n=500 -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=15" | ||
|
||
stress_cmd: | ||
- "cassandra-stress read cl=ONE duration=2860m -schema 'keyspace=lowload1 replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=LeveledCompactionStrategy)' -mode cql3 native -rate threads=1 -col 'size=FIXED(50) n=FIXED(1)' -pop seq=1..500 -log interval=5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't the correct c-s commands for the RF=1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a job just for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could keep it after, but yes, I'm testing with it.
test-cases/longevity/longevity-200GB-48h-verifier-LimitedMonkey-tls.yaml
Show resolved
Hide resolved
b990bf6
to
77bb238
Compare
77bb238
to
2f84dbc
Compare
72d7eb5
to
857a6ae
Compare
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
857a6ae
to
818b2aa
Compare
This change adds new configurations for 200gb-48 longevities and one
low load 4 hour longevity, intended to simulate a low load happening
during repair processes, to cover potential overhead like in
scylladb/scylladb#14093.
Task: scylladb/qa-tasks#1416
PR pre-checks (self review)
backport
labelssdcm/sct_config.py
)unit-test/
folder)