Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: perf: add end-to-end benchmark for alternator #13121

Merged
merged 8 commits into from
May 13, 2024

Conversation

nuivall
Copy link
Member

@nuivall nuivall commented Mar 9, 2023

The code is based on similar idea as perf_simple_query. The main differences are:

  • it starts full scylla process
  • communicates with alternator via http (localhost)
  • uses richer table schema with all dynamoDB types instead of only strings

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Results on my machine (with 1 vCPU):

./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34

./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97

The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).

Related: #12518

@nuivall nuivall requested a review from nyh March 9, 2023 12:17
@scylladb-promoter
Copy link
Contributor

@nyh
Copy link
Contributor

nyh commented Mar 12, 2023

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Ideally an external process could have also gotten the same counters using metrics?

What worries me a bit about having the client in the same process as the server is that it mixes the performance of the client and server, and also might make it harder to include some things (like client authentication). But I guess it's a good start.

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (I just left a few comments/questions), I am guessing that you'll probably want to improve this code together with doing Alternator optimizations, and understanding better what you really want to benchmark / profile.

main.cc Outdated
@@ -1719,6 +1721,9 @@ To start the scylla server proper, simply invoke as: scylla server (or just scyl
});

startlog.info("Scylla version {} initialization completed.", scylla_version());
if(after_init_func) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

main.cc Outdated
@@ -1768,6 +1773,7 @@ int main(int ac, char** av) {
{"perf-row-cache-update", perf::scylla_row_cache_update_main, "run performance tests by updating row cache on this server"},
{"perf-simple-query", perf::scylla_simple_query_main, "run performance tests by sending simple queries to this server"},
{"perf-sstable", perf::scylla_sstable_main, "run performance tests by exercising sstable related operations on this server"},
{"perf-alternator-workloads", perf::alternator_workloads(scylla_main, &after_init_func), "run performance tests on full alternator stack"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought (feel free to discard): this after_init_func made the alternator_workloads function very weird - it needs to return a lambda, set another lambda, etc. Wouldn't it be easier for main to offer a promise to wait for initialization to complete (maybe we already have such a thing somehow?) the the workload function will be just like the rest, just start by running main and await in the beginning for the initialization to complete?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's doable to split it to 2 parts, but I wanted changes to be minimal in scylla_main as it's already very complex (or at least long).

It has something like supervisor::notify which could be extended, although this is "global complexity" vs "local complexity" to me. perf::alternator_workloads is more complex but it doesn't affect anything not related to it while extending this supervisor::notify for instance (or something similar) would affect every usage.

the difference versus other perf:: functions from above stem from the fact that it's the only one which needs scylla_main to be called, other replace it.

req._headers["X-Amz-Target:"] = "DynamoDB_20120810." + operation;
req.write_body("application/x-amz-json-1.0", std::move(body));
co_await cli.make_request(std::move(req), [] (const http::reply& rep, input_stream<char>&& in_) -> future<> {
auto in = std::move(in_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this move achieve?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably to avoid the lambda coroutine fiasco. Please see if coroutine::lambda() fixes it instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but wasn't this fiasco about captures, not parameters? (but maybe I'm misremembering).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was added to keep it alive. Indeed coroutine fiasco document mentions only captures. I am not sure why this is happening. It looks like in_ is freed after the first suspension point in lambda.

Adding coroutine::lambda or the second solution with std::ref doesn't help here.

Perhaps there is something wrong with caller code in seastar?

            return do_with(std::move(rep), [&con, handle = std::move(handle)] (auto& rep) mutable {
                return handle(rep, con.in(rep));
            });

handle is my lambda from above. I've tried also put it in do_with to keep it alive, without success:

            return do_with(std::move(rep), std::move(handle), [&con] (auto& rep, auto& handle) mutable {
                return handle(rep, con.in(rep));
            });

so this trick with
auto in = std::move(in_); is the only one which works for me (I saw it also here: https://github.com/scylladb/scylladb/blob/master/alternator/executor.cc#L91 and in one of Pavel's wip patches)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire coroutine frame is lost. It applies equally to captures, parameters, and locals that are promoted to live in the coroutine frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's better to use coroutine::lambda(), it solves the problem rather than working around it and failing if someone adds a local variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like just wrap the lambda?

co_await cli.make_request(std::move(req), coroutine::lambda([] (const http::reply& rep, input_stream<char>&& in_) -> future<> {
...
})

tested this before and it doesn't solve it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be solved but in seastar instead:

diff --git a/src/http/client.cc b/src/http/client.cc
index 7a66823c..46df99df 100644
--- a/src/http/client.cc
+++ b/src/http/client.cc
@@ -236,9 +236,9 @@ future<> client::make_request(request req, reply_handler handle, reply::status_t
                 return make_exception_future<>(std::runtime_error(format("request finished with {}", rep._status)));
             }
 
-            return do_with(std::move(rep), [&con, handle = std::move(handle)] (auto& rep) mutable {
-                return handle(rep, con.in(rep));
-            });
+            return do_with(std::move(rep), coroutine::lambda([&con, handle = std::move(handle)] (auto& rep) mutable -> future<> {
+                co_await handle(rep, con.in(rep));
+            }));
         });
     });
 }

while this is documented
/// lambda coroutine must complete (co_await) in the same statement.
I am not sure why it requires immediate co_await to work

fun_t fun = it->second;

auto results = time_parallel([&] {
static thread_local auto sharded_cli = get_client(c.port); // for simplicity never closed as it lives for the whole process runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what I'm seeing here. You have "concurrency", and yet just one "cli" object per shard. How does it work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was working because underneath connection::make_request doesn't yield, at least in this case.

Although I think it's more realistic for alternator to work on a higher number of connections, so I will add a pool, making one number of connections equal to concurrency.

@nuivall nuivall force-pushed the alternator_perf_2 branch 2 times, most recently from 6180681 to ac1823d Compare March 21, 2023 11:56
@nuivall
Copy link
Member Author

nuivall commented Mar 21, 2023

v2:

  • space added
  • connection pool instead of one connection per shard
  • added option to run against remote host

@nuivall nuivall requested review from avikivity and nyh March 21, 2023 11:58
@scylladb-promoter
Copy link
Contributor

@scylladb-promoter
Copy link
Contributor

@nuivall
Copy link
Member Author

nuivall commented May 30, 2023

v3:

  • added scan workload

@scylladb-promoter
Copy link
Contributor

@scylladb-promoter
Copy link
Contributor

@scylladb-promoter
Copy link
Contributor

@scylladb-promoter
Copy link
Contributor

@nuivall
Copy link
Member Author

nuivall commented Oct 25, 2023

v4:

  • added option to continue after failed requests
  • added gsi write workload

nuivall added a commit to nuivall/scylladb that referenced this pull request Oct 25, 2023
…vice group

When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) uncommited framework for easy alternator
performance testing: scylladb#13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.
nuivall added a commit to nuivall/scylladb that referenced this pull request Oct 25, 2023
…vice group

When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) uncommited framework for easy alternator
performance testing: scylladb#13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes scylladb#15844
nuivall added a commit to nuivall/scylladb that referenced this pull request Oct 25, 2023
…vice group

When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: scylladb#13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes scylladb#15844
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests
❌ - Sanity Tests

Failed Tests (1/21011):

Build Details:

@mykaul
Copy link
Contributor

mykaul commented Oct 26, 2023

Test failure could be #15334 - looks unrelated to this PR?

avikivity pushed a commit that referenced this pull request Oct 26, 2023
…vice group

When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: #13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes #15844

Closes #15845
@mykaul mykaul added this to the 6.1 milestone May 6, 2024
@mykaul mykaul added area/alternator Alternator related Issues symptom/performance Issues causing performance problems P1 Urgent labels May 6, 2024
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
✅ - dtest
✅ - dtest with topology changes
❌ - Unit Tests

Failed Tests (2/30566):

Build Details:

  • Duration: 2 hr 40 min
  • Builder: spider3.cloudius-systems.com

The code is based on similar idea as perf_simple_query. The main differences are:
- it starts full scylla process
- communicates with alternator via http (localhost)
- uses richer table schema with all dynamoDB types instead of only strings

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Results on my machine (with 1 vCPU):
> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34

> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97

The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).
@nuivall
Copy link
Member Author

nuivall commented May 9, 2024

@nyh I've synchronized this PR with enterprise one. I think you can merge both.

@nuivall nuivall added the backport/none Backport is not required label May 9, 2024
@avikivity
Copy link
Member

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

How do you isolate client code from server code?

@nuivall
Copy link
Member Author

nuivall commented May 9, 2024

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

How do you isolate client code from server code?

I don't. Anyway interesting is delta (i.e. before and after some patch). Absolute number has very little meaning.

@mykaul mykaul modified the milestones: 6.1, 6.0 May 9, 2024
@nyh
Copy link
Contributor

nyh commented May 9, 2024

@nyh I've synchronized this PR with enterprise one. I think you can merge both.

Thanks! I'm waiting for the CI to finish.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
❌ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 5 hr 49 min
  • Builder: spider7.cloudius-systems.com

@nuivall
Copy link
Member Author

nuivall commented May 10, 2024

Seemingly unrelated test failed, filled https://github.com/scylladb/scylla-dtest/issues/4252, restarting

@nuivall
Copy link
Member Author

nuivall commented May 10, 2024

@yarongilor clang-tidy is failing with

Error: /home/runner/work/scylladb/scylladb/serializer_impl.hh:20:10: error: 'absl/container/btree_set.h' file not found [clang-diagnostic-error]
   20 | #include <absl/container/btree_set.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
(...)
ninja: build stopped: subcommand failed.
Error: Process completed with exit code 1.

don't see ehow this is related to the PR. Does it block the merge now?

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 38 min
  • Builder: spider5.cloudius-systems.com

@nyh
Copy link
Contributor

nyh commented May 12, 2024

Thanks @nuivall. I see CI passed, so I merged now.

@scylladb-promoter scylladb-promoter merged commit 9813ec9 into scylladb:master May 13, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alternator Alternator related Issues backport/none Backport is not required P1 Urgent promoted-to-master symptom/performance Issues causing performance problems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants