test: perf: add end-to-end benchmark for alternator #13121

nuivall · 2023-03-09T12:16:50Z

The code is based on similar idea as perf_simple_query. The main differences are:

it starts full scylla process
communicates with alternator via http (localhost)
uses richer table schema with all dynamoDB types instead of only strings

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Results on my machine (with 1 vCPU):

./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34

./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97

The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).

Related: #12518

scylladb-promoter · 2023-03-09T18:41:35Z

CI state SUCCESS - https://jenkins.scylladb.com/job/releng/job/Scylla-CI/4845/

nyh · 2023-03-12T13:33:41Z

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Ideally an external process could have also gotten the same counters using metrics?

What worries me a bit about having the client in the same process as the server is that it mixes the performance of the client and server, and also might make it harder to include some things (like client authentication). But I guess it's a good start.

nyh

Looks good (I just left a few comments/questions), I am guessing that you'll probably want to improve this code together with doing Alternator optimizations, and understanding better what you really want to benchmark / profile.

nyh · 2023-03-12T13:36:14Z

main.cc

@@ -1719,6 +1721,9 @@ To start the scylla server proper, simply invoke as: scylla server (or just scyl
            });

            startlog.info("Scylla version {} initialization completed.", scylla_version());
+            if(after_init_func) {


space after if

nyh · 2023-03-12T13:46:14Z

main.cc

@@ -1768,6 +1773,7 @@ int main(int ac, char** av) {
        {"perf-row-cache-update", perf::scylla_row_cache_update_main, "run performance tests by updating row cache on this server"},
        {"perf-simple-query", perf::scylla_simple_query_main, "run performance tests by sending simple queries to this server"},
        {"perf-sstable", perf::scylla_sstable_main, "run performance tests by exercising sstable related operations on this server"},
+        {"perf-alternator-workloads", perf::alternator_workloads(scylla_main, &after_init_func), "run performance tests on full alternator stack"}


A thought (feel free to discard): this after_init_func made the alternator_workloads function very weird - it needs to return a lambda, set another lambda, etc. Wouldn't it be easier for main to offer a promise to wait for initialization to complete (maybe we already have such a thing somehow?) the the workload function will be just like the rest, just start by running main and await in the beginning for the initialization to complete?)

Maybe it's doable to split it to 2 parts, but I wanted changes to be minimal in scylla_main as it's already very complex (or at least long).

It has something like supervisor::notify which could be extended, although this is "global complexity" vs "local complexity" to me. perf::alternator_workloads is more complex but it doesn't affect anything not related to it while extending this supervisor::notify for instance (or something similar) would affect every usage.

the difference versus other perf:: functions from above stem from the fact that it's the only one which needs scylla_main to be called, other replace it.

nyh · 2023-03-12T13:56:10Z

test/perf/perf_alternator_workloads.cc

+    req._headers["X-Amz-Target:"] = "DynamoDB_20120810." + operation;
+    req.write_body("application/x-amz-json-1.0", std::move(body));
+    co_await cli.make_request(std::move(req), [] (const http::reply& rep, input_stream<char>&& in_) -> future<> {
+        auto in = std::move(in_);


what does this move achieve?

It's probably to avoid the lambda coroutine fiasco. Please see if coroutine::lambda() fixes it instead.

I thought about this, but wasn't this fiasco about captures, not parameters? (but maybe I'm misremembering).

yes, it was added to keep it alive. Indeed coroutine fiasco document mentions only captures. I am not sure why this is happening. It looks like in_ is freed after the first suspension point in lambda.

Adding coroutine::lambda or the second solution with std::ref doesn't help here.

Perhaps there is something wrong with caller code in seastar?

return do_with(std::move(rep), [&con, handle = std::move(handle)] (auto& rep) mutable { return handle(rep, con.in(rep)); });

handle is my lambda from above. I've tried also put it in do_with to keep it alive, without success:

return do_with(std::move(rep), std::move(handle), [&con] (auto& rep, auto& handle) mutable { return handle(rep, con.in(rep)); });

so this trick with
auto in = std::move(in_); is the only one which works for me (I saw it also here: https://github.com/scylladb/scylladb/blob/master/alternator/executor.cc#L91 and in one of Pavel's wip patches)

The entire coroutine frame is lost. It applies equally to captures, parameters, and locals that are promoted to live in the coroutine frame.

So it's better to use coroutine::lambda(), it solves the problem rather than working around it and failing if someone adds a local variable.

like just wrap the lambda?

co_await cli.make_request(std::move(req), coroutine::lambda([] (const http::reply& rep, input_stream<char>&& in_) -> future<> { ... })

tested this before and it doesn't solve it

I think it can be solved but in seastar instead:

diff --git a/src/http/client.cc b/src/http/client.cc index 7a66823c..46df99df 100644 --- a/src/http/client.cc +++ b/src/http/client.cc @@ -236,9 +236,9 @@ future<> client::make_request(request req, reply_handler handle, reply::status_t return make_exception_future<>(std::runtime_error(format("request finished with {}", rep._status))); } - return do_with(std::move(rep), [&con, handle = std::move(handle)] (auto& rep) mutable { - return handle(rep, con.in(rep)); - }); + return do_with(std::move(rep), coroutine::lambda([&con, handle = std::move(handle)] (auto& rep) mutable -> future<> { + co_await handle(rep, con.in(rep)); + })); }); }); }

while this is documented
/// lambda coroutine must complete (co_await) in the same statement.
I am not sure why it requires immediate co_await to work

nyh · 2023-03-12T14:40:38Z

test/perf/perf_alternator_workloads.cc

+    fun_t fun = it->second;
+
+    auto results = time_parallel([&] {
+        static thread_local auto sharded_cli = get_client(c.port); // for simplicity never closed as it lives for the whole process runtime


I don't understand what I'm seeing here. You have "concurrency", and yet just one "cli" object per shard. How does it work?

I think it was working because underneath connection::make_request doesn't yield, at least in this case.

Although I think it's more realistic for alternator to work on a higher number of connections, so I will add a pool, making one number of connections equal to concurrency.

nuivall · 2023-03-21T11:58:32Z

v2:

space added
connection pool instead of one connection per shard
added option to run against remote host

scylladb-promoter · 2023-03-21T15:23:00Z

CI state SUCCESS - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/193/

scylladb-promoter · 2023-05-26T14:44:30Z

CI state FAILURE - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1487/

nuivall · 2023-05-30T13:39:19Z

v3:

added scan workload

scylladb-promoter · 2023-05-30T15:57:55Z

CI state FAILURE - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1576/

scylladb-promoter · 2023-05-31T13:03:47Z

CI state FAILURE - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1591/

scylladb-promoter · 2023-06-01T09:42:29Z

CI state SUCCESS - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1605/

scylladb-promoter · 2023-07-19T14:23:25Z

CI state SUCCESS - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/2578/

nuivall · 2023-10-25T17:40:32Z

v4:

added option to continue after failed requests
added gsi write workload

…vice group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) uncommited framework for easy alternator performance testing: scylladb#13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads.

…vice group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) uncommited framework for easy alternator performance testing: scylladb#13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes scylladb#15844

…vice group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) not committed framework for easy alternator performance testing: scylladb#13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes scylladb#15844

scylladb-promoter · 2023-10-25T20:15:27Z

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests
❌ - Sanity Tests

Failed Tests (1/21011):

test_replace_node_using_the_same_ip_then_shut_down[rbo_disabled] 🔍

Build Details:

Build URL: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/4388/
Duration: 2 hr 35 min
Builder: spider5.cloudius-systems.com

mykaul · 2023-10-26T07:29:13Z

Test failure could be #15334 - looks unrelated to this PR?

…vice group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) not committed framework for easy alternator performance testing: #13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes #15844 Closes #15845

scylladb-promoter · 2024-05-06T17:08:46Z

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
✅ - dtest
✅ - dtest with topology changes
❌ - Unit Tests

Failed Tests (2/30566):

Build Details:

Duration: 2 hr 40 min
Builder: spider3.cloudius-systems.com

It will be reused later by a new tool.

The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).

nuivall · 2024-05-09T12:04:08Z

@nyh I've synchronized this PR with enterprise one. I think you can merge both.

avikivity · 2024-05-09T12:52:09Z

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

How do you isolate client code from server code?

nuivall · 2024-05-09T13:35:57Z

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

How do you isolate client code from server code?

I don't. Anyway interesting is delta (i.e. before and after some patch). Absolute number has very little meaning.

nyh · 2024-05-09T15:45:47Z

@nyh I've synchronized this PR with enterprise one. I think you can merge both.

Thanks! I'm waiting for the CI to finish.

scylladb-promoter · 2024-05-09T17:56:12Z

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
❌ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

Duration: 5 hr 49 min
Builder: spider7.cloudius-systems.com

nuivall · 2024-05-10T08:36:01Z

Seemingly unrelated test failed, filled https://github.com/scylladb/scylla-dtest/issues/4252, restarting

nuivall · 2024-05-10T08:38:51Z

@yarongilor clang-tidy is failing with

Error: /home/runner/work/scylladb/scylladb/serializer_impl.hh:20:10: error: 'absl/container/btree_set.h' file not found [clang-diagnostic-error]
   20 | #include <absl/container/btree_set.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
(...)
ninja: build stopped: subcommand failed.
Error: Process completed with exit code 1.

don't see ehow this is related to the PR. Does it block the merge now?

scylladb-promoter · 2024-05-10T11:15:22Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

Duration: 2 hr 38 min
Builder: spider5.cloudius-systems.com

nyh · 2024-05-12T15:16:16Z

Thanks @nuivall. I see CI passed, so I merged now.

nuivall requested a review from nyh March 9, 2023 12:17

nyh reviewed Mar 12, 2023

View reviewed changes

nuivall force-pushed the alternator_perf_2 branch 2 times, most recently from 6180681 to ac1823d Compare March 21, 2023 11:56

nuivall requested review from avikivity and nyh March 21, 2023 11:58

nuivall force-pushed the alternator_perf_2 branch from ac1823d to aeebf55 Compare May 26, 2023 13:27

nuivall mentioned this pull request May 31, 2023

Yield while building large results in Alternator - rjson::print, executor::batch_get_item #12351

Merged

nuivall force-pushed the alternator_perf_2 branch from 9cadd30 to 7727f01 Compare May 31, 2023 10:38

nyh mentioned this pull request Jul 6, 2023

Optimize Alternator-specific code paths by profiling microbenchmarks #12518

Open

nuivall force-pushed the alternator_perf_2 branch from 7727f01 to 1af1e5f Compare July 19, 2023 11:59

nuivall force-pushed the alternator_perf_2 branch from 1af1e5f to f3dcc20 Compare October 25, 2023 17:39

nuivall mentioned this pull request Oct 25, 2023

db: view: run local materialized view mutations on a separate smp service group #15845

Closed

mykaul added this to the 6.1 milestone May 6, 2024

mykaul added area/alternator Alternator related Issues symptom/performance Issues causing performance problems P1 Urgent labels May 6, 2024

nuivall force-pushed the alternator_perf_2 branch from c909bda to 3b8f81c Compare May 6, 2024 14:28

nuivall added 8 commits May 9, 2024 13:58

test: perf: extract result aggregation logic to a separate struct

6152223

It will be reused later by a new tool.

test: perf: add scan workload for alternator

5b8e554

test: perf: add read modify write workload for alternator (lwt)

70b5b50

test: perf: add option to continue after failed request

43a64ac

test: perf: add global secondary indexes write workload for alternator

5b8acf1

perf-alternator-workloads: add operations-per-shard option

fd416fa

test: perf: alternator: add option to skip data pre-population

a109979

nuivall force-pushed the alternator_perf_2 branch from 3b8f81c to a109979 Compare May 9, 2024 12:00

nuivall added the backport/none Backport is not required label May 9, 2024

mykaul modified the milestones: 6.1, 6.0 May 9, 2024

scylladb-promoter closed this in 9813ec9 May 13, 2024

scylladb-promoter merged commit 9813ec9 into scylladb:master May 13, 2024
9 of 11 checks passed

github-actions bot added the promoted-to-master label May 13, 2024

test: perf: add end-to-end benchmark for alternator #13121

test: perf: add end-to-end benchmark for alternator #13121

Conversation

nuivall commented Mar 9, 2023 • edited

scylladb-promoter commented Mar 9, 2023

nyh commented Mar 12, 2023

nyh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nuivall commented Mar 21, 2023

scylladb-promoter commented Mar 21, 2023

scylladb-promoter commented May 26, 2023

nuivall commented May 30, 2023

scylladb-promoter commented May 30, 2023

scylladb-promoter commented May 31, 2023

scylladb-promoter commented Jun 1, 2023

scylladb-promoter commented Jul 19, 2023

nuivall commented Oct 25, 2023

scylladb-promoter commented Oct 25, 2023

🔴 CI State: FAILURE

Failed Tests (1/21011):

Build Details:

mykaul commented Oct 26, 2023

scylladb-promoter commented May 6, 2024

🔴 CI State: FAILURE

Failed Tests (2/30566):

Build Details:

nuivall commented May 9, 2024

avikivity commented May 9, 2024

nuivall commented May 9, 2024

nyh commented May 9, 2024

scylladb-promoter commented May 9, 2024

🔴 CI State: FAILURE

Build Details:

nuivall commented May 10, 2024

nuivall commented May 10, 2024 • edited

scylladb-promoter commented May 10, 2024

🟢 CI State: SUCCESS

Build Details:

nyh commented May 12, 2024 • edited

nuivall commented Mar 9, 2023 •

edited

nuivall commented May 10, 2024 •

edited

nyh commented May 12, 2024 •

edited