New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM only: test_topology_smp.test_nodes_with_different_smp fails in debug mode #14093
Comments
extremely slow boostrap via repair when starting second node, this log is for
ran this locally:
duration=17 seconds. This log is from the second, bootstrapping node. It's CPU usage is low, but CPU usage of the first (source) node is huge (248.2%). Should run it under perf probably. |
|
|
In |
New failure by the same reason (
|
I'm not familiar enough in this area... |
@asias ping |
@gusev-p we need it urgently it's failing master |
the PR is approved, waiting for last CI/merge |
…m Gusev Petr Consider a cluster with no data, e.g. in tests. When a new node is bootstrapped with repair we iterate over all (shard, table, range), read data from all the peer nodes for the range, look for any discrepancies and heal them. Even for small num_tokens (16 in the tests) the number of affected ranges (those we need to consider) amounts to total number of tokens in the cluster, which is 32 for the second node and 48 for the third. Multiplying this by the number of shards and the number of tables in each keyspace gives thousands of ranges. For each of them we need to follow some row level repair protocol, which includes several RPC exchanges between the peer nodes and creating some data structures on them. These exchanges are processed sequentially for each shard, there are `parallel_for_each` in code, but they are throttled by the choosen memory constraints and in fact execute sequentially. When the bootstrapping node (master) reaches a peer node and asks for data in the specific range and master shard, two options exist. If sharder parameters (primarily, `--smp`) are the same on the master and on the peer, we can just read one local shard, this is fast. If, on the other hand, `--smp` is different, we need to do a multishard query. The given range from the master can contain data from different peer shards, so we split this range into a number of subranges such that each of them contain data only from the given master shard (`dht::selective_token_range_sharder`). The number of these subranges can be quite big (300 in the tests). For each of these subranges we do `fast_forward_to` on the `multishard_reader`, and this incurs a lot of overhead, mainly becuse of `smp::submit_to`. In this series we optimize this case. Instead of splitting the master range and reading only what's needed, we read all the data in the range and then apply the filter by the master shard. We do this if the estimated number of partitions is small (<=100). This is the logs of starting a second node with `--smp 4`, first node was `--smp 3`: ``` with this patch 20:58:49.644 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.222.46.3 in scylla-2... 20:59:22.713 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.222.46.3 in scylla-2, pid 1132859 without this patch 21:04:06.424 INFO> [debug/topology_custom.test_topology_smp.1] starting server at host 127.181.31.3 in scylla-2... 21:06:01.287 INFO> [debug/topology_custom.test_topology_smp.1] started server at host 127.181.31.3 in scylla-2, pid 1134140 ``` Fixes: #14093 Closes #14178 * github.com:scylladb/scylladb: repair_test: add test_reader_with_different_strategies repair: extract repair_reader declaration into reader.hh repair_meta: get_estimated_partitions fix repair_meta: use multishard_filter reader if the number of partitions is small repair_meta: delay _repair_reader creation database.hh: make_multishard_streaming_reader with range parameter database.cc: extract streaming_reader_lifecycle_policy
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
This change adds new configurations for 200gb-48 longevities and one low load 4 hour longevity, intended to simulate a low load happening during repair processes, to cover potential overhead like in scylladb/scylladb#14093. Task: scylladb/qa-tasks#1416
Fix is a performance optimization, not backporting. |
https://jenkins.scylladb.com/view/nexts/job/scylla-master/job/next/lastCompletedBuild/consoleFull
The text was updated successfully, but these errors were encountered: