[release] 2.8 release perf logs #40571

vitsai · 2023-10-23T17:13:19Z

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

can-anyscale · 2023-10-23T17:18:07Z

w00h00 thank you, I'll leave this up for your team to review then. Also they normally try something like this #29615 to make it easier to review.

vitsai · 2023-10-23T17:19:42Z

REGRESSION 39.88%: dashboard_p99_latency_ms (LATENCY) regresses from 13941.91 to 19502.436 (39.88%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 37.76%: dashboard_p95_latency_ms (LATENCY) regresses from 6729.761 to 9271.209 (37.76%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 22.64%: dashboard_p50_latency_ms (LATENCY) regresses from 5.534 to 6.787 (22.64%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 21.37%: multi_client_put_gigabytes (THROUGHPUT) regresses from 38.605668097256924 to 30.354050606212375 (21.37%) in 2.8.0/microbenchmark.json
REGRESSION 14.94%: 1000000_queued_time (LATENCY) regresses from 181.82263824499995 to 208.991110036 (14.94%) in 2.8.0/scalability/single_node.json
REGRESSION 13.00%: avg_iteration_time (LATENCY) regresses from 1.4622855401039123 to 1.6523929166793823 (13.00%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 12.51%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 573.4457553242221 to 501.7198350460272 (12.51%) in 2.8.0/microbenchmark.json
REGRESSION 12.48%: 1_n_actor_calls_async (THROUGHPUT) regresses from 10133.72696574923 to 8869.51837285407 (12.48%) in 2.8.0/microbenchmark.json
REGRESSION 9.56%: multi_client_tasks_async (THROUGHPUT) regresses from 28423.644858766176 to 25705.1703030958 (9.56%) in 2.8.0/microbenchmark.json
REGRESSION 9.52%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 9124.377528222414 to 8255.28161190285 (9.52%) in 2.8.0/microbenchmark.json
REGRESSION 9.39%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1080.2139341634759 to 978.7677324282791 (9.39%) in 2.8.0/microbenchmark.json
REGRESSION 9.32%: 10000_get_time (LATENCY) regresses from 23.55671212599998 to 25.751547934 (9.32%) in 2.8.0/scalability/single_node.json
REGRESSION 9.27%: stage_2_avg_iteration_time (LATENCY) regresses from 58.160910558700564 to 63.549897527694704 (9.27%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 9.03%: dashboard_p50_latency_ms (LATENCY) regresses from 3.465 to 3.778 (9.03%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 8.65%: single_client_tasks_async (THROUGHPUT) regresses from 10739.407361558973 to 9810.522775386338 (8.65%) in 2.8.0/microbenchmark.json
REGRESSION 7.01%: client__tasks_and_get_batch (THROUGHPUT) regresses from 1.002041264301031 to 0.9317720482301839 (7.01%) in 2.8.0/microbenchmark.json
REGRESSION 6.86%: stage_1_avg_iteration_time (LATENCY) regresses from 23.305240750312805 to 24.904260087013245 (6.86%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 6.63%: single_client_tasks_sync (THROUGHPUT) regresses from 1311.812164358857 to 1224.7771019170784 (6.63%) in 2.8.0/microbenchmark.json
REGRESSION 6.47%: 10000_args_time (LATENCY) regresses from 16.89121779300001 to 17.983383886000013 (6.47%) in 2.8.0/scalability/single_node.json
REGRESSION 6.40%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 14.755162462843568 to 13.810261182479115 (6.40%) in 2.8.0/microbenchmark.json
REGRESSION 5.08%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1083.9300708022135 to 1028.839383358896 (5.08%) in 2.8.0/microbenchmark.json
REGRESSION 4.95%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 25688.484755543966 to 24418.02409630373 (4.95%) in 2.8.0/microbenchmark.json
REGRESSION 4.77%: actors_per_second (THROUGHPUT) regresses from 748.5322140167257 to 712.822673976586 (4.77%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 4.54%: n_n_actor_calls_async (THROUGHPUT) regresses from 30847.92669705198 to 29447.495631692484 (4.54%) in 2.8.0/microbenchmark.json
REGRESSION 3.68%: 3000_returns_time (LATENCY) regresses from 5.6602293089999876 to 5.868746854999998 (3.68%) in 2.8.0/scalability/single_node.json
REGRESSION 3.49%: client__put_calls (THROUGHPUT) regresses from 857.6367908455961 to 827.6636203329824 (3.49%) in 2.8.0/microbenchmark.json
REGRESSION 2.93%: 1_1_actor_calls_async (THROUGHPUT) regresses from 7615.355914488919 to 7392.382505877325 (2.93%) in 2.8.0/microbenchmark.json
REGRESSION 2.75%: stage_4_spread (LATENCY) regresses from 0.7217020493267903 to 0.7415504343465557 (2.75%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.40%: stage_3_time (LATENCY) regresses from 2802.1650245189667 to 2869.4522173404694 (2.40%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.20%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5766.661045557541 to 5639.599660122037 (2.20%) in 2.8.0/microbenchmark.json
REGRESSION 1.54%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 3074.0790016310475 to 3026.6872241733467 (1.54%) in 2.8.0/microbenchmark.json
REGRESSION 1.25%: client__put_gigabytes (THROUGHPUT) regresses from 0.13283428838343245 to 0.13117919950925 (1.25%) in 2.8.0/microbenchmark.json
REGRESSION 0.52%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11411.245745812425 to 11351.922193872259 (0.52%) in 2.8.0/microbenchmark.json
REGRESSION 0.36%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2255.614293958201 to 2247.4937092440123 (0.36%) in 2.8.0/microbenchmark.json

vitsai · 2023-10-23T17:24:19Z

Diff here: #40572

rickyyx · 2023-10-23T20:54:45Z

multi_client_put_gigabytes
should be due to flakiness:

rickyyx · 2023-10-23T22:53:47Z

1000000_queued_time probably does have a regression, but it seems less serious than 20%, seems to be 185 -> 200, ~8%

rickyyx · 2023-10-23T22:55:10Z

avg_iteration_time seems to be flaky - and looks like not in the master branch, could be infra errors?

rickyyx · 2023-10-23T22:55:51Z

client__1_1_actor_calls_sync is not regression

rickyyx · 2023-10-23T22:58:30Z

1_n_actor_calls_async is probably rpc related, but it never recovers. There seems to be ~10% regression

rickyyx · 2023-10-23T23:01:27Z

multi_client_tasks_async is similar to 1_n_actor_calls_async it never recovers from the dip.

rickyyx · 2023-10-23T23:03:12Z

1_n_async_actor_calls_async looks like regression.

rickyyx · 2023-10-23T23:06:24Z

10000_get_time is not regression - just flaky.

rickyyx · 2023-10-23T23:15:33Z

stage_2_avg_iteration_time
This seems to be regression:

rickyyx · 2023-10-23T23:17:21Z

single_client_tasks_async seems to be ok for now

rickyyx · 2023-10-23T23:20:22Z

stage_1_avg_iteration_time seems there's some regression, even though not huge. But looks obvious.

rickyyx · 2023-10-23T23:21:44Z

single_client_tasks_sync seems real even though not significant.

vitsai · 2023-10-23T23:45:50Z

For avg_iteration_time, the value of 1.65 on release branch is out of distribution though

rickyyx · 2023-10-23T23:48:44Z

dashboard_p99_latency_ms and other regressions are related to the GCS task backend. It's expected since we do have more data to return while it was all dropped from the previous code.

rickyyx · 2023-10-23T23:49:34Z

For avg_iteration_time, the value of 1.65 on release branch is out of distribution though

Yeah, but given the release branch is just a prefix of the master branch, I belive it's more of flakiness. Could we rerun it?

vitsai · 2023-10-23T23:53:53Z

Will run again after cherry picks, should give more signal on those edge of distribution values

vitsai · 2023-10-24T16:56:16Z

The new ones from today. Still waiting on @GeneDer for 2.7.1 release logs to compare against (instead of 2.7.0)

REGRESSION 52.83%: dashboard_p95_latency_ms (LATENCY) regresses from 47.45 to 72.519 (52.83%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 51.28%: dashboard_p95_latency_ms (LATENCY) regresses from 6729.761 to 10180.702 (51.28%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 50.84%: dashboard_p99_latency_ms (LATENCY) regresses from 13941.91 to 21030.085 (50.84%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 20.19%: 1000000_queued_time (LATENCY) regresses from 181.82263824499995 to 218.523912088 (20.19%) in 2.8.0/scalability/single_node.json
REGRESSION 19.50%: multi_client_put_gigabytes (THROUGHPUT) regresses from 38.605668097256924 to 31.07867911026417 (19.50%) in 2.8.0/microbenchmark.json
REGRESSION 16.66%: dashboard_p50_latency_ms (LATENCY) regresses from 5.534 to 6.456 (16.66%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 15.31%: dashboard_p99_latency_ms (LATENCY) regresses from 143.498 to 165.464 (15.31%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 13.45%: dashboard_p50_latency_ms (LATENCY) regresses from 3.465 to 3.931 (13.45%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 12.98%: stage_2_avg_iteration_time (LATENCY) regresses from 58.160910558700564 to 65.70907316207885 (12.98%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 10.61%: 10000_get_time (LATENCY) regresses from 23.55671212599998 to 26.055165431999995 (10.61%) in 2.8.0/scalability/single_node.json
REGRESSION 10.54%: single_client_tasks_async (THROUGHPUT) regresses from 10739.407361558973 to 9607.186982064028 (10.54%) in 2.8.0/microbenchmark.json
REGRESSION 10.34%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 14.755162462843568 to 13.228837029499449 (10.34%) in 2.8.0/microbenchmark.json
REGRESSION 10.23%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 70.13534577099995 to 77.31193332200007 (10.23%) in 2.8.0/scalability/object_store.json
REGRESSION 9.35%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 9124.377528222414 to 8271.416009416063 (9.35%) in 2.8.0/microbenchmark.json
REGRESSION 8.75%: multi_client_tasks_async (THROUGHPUT) regresses from 28423.644858766176 to 25935.554390623118 (8.75%) in 2.8.0/microbenchmark.json
REGRESSION 8.53%: single_client_tasks_sync (THROUGHPUT) regresses from 1311.812164358857 to 1199.8831112257556 (8.53%) in 2.8.0/microbenchmark.json
REGRESSION 8.22%: stage_1_avg_iteration_time (LATENCY) regresses from 23.305240750312805 to 25.220864820480347 (8.22%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 7.63%: 1_n_actor_calls_async (THROUGHPUT) regresses from 10133.72696574923 to 9360.803502103265 (7.63%) in 2.8.0/microbenchmark.json
REGRESSION 7.58%: n_n_actor_calls_async (THROUGHPUT) regresses from 30847.92669705198 to 28510.050783675328 (7.58%) in 2.8.0/microbenchmark.json
REGRESSION 5.89%: stage_3_time (LATENCY) regresses from 2802.1650245189667 to 2967.23273229599 (5.89%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 4.69%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 25688.484755543966 to 24484.174826322916 (4.69%) in 2.8.0/microbenchmark.json
REGRESSION 4.23%: 3000_returns_time (LATENCY) regresses from 5.6602293089999876 to 5.899374877999989 (4.23%) in 2.8.0/scalability/single_node.json
REGRESSION 3.91%: actors_per_second (THROUGHPUT) regresses from 748.5322140167257 to 719.3018798022547 (3.91%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 3.63%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1080.2139341634759 to 1040.99657229968 (3.63%) in 2.8.0/microbenchmark.json
REGRESSION 3.46%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 3074.0790016310475 to 2967.685662012421 (3.46%) in 2.8.0/microbenchmark.json
REGRESSION 3.40%: avg_iteration_time (LATENCY) regresses from 1.4622855401039123 to 1.5120103502273559 (3.40%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 3.10%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 4745.83263563276 to 4598.670938618477 (3.10%) in 2.8.0/microbenchmark.json
REGRESSION 3.03%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 9.369535279594958 to 9.086079282777337 (3.03%) in 2.8.0/microbenchmark.json
REGRESSION 2.91%: 10000_args_time (LATENCY) regresses from 16.89121779300001 to 17.381954480000005 (2.91%) in 2.8.0/scalability/single_node.json
REGRESSION 2.85%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1083.9300708022135 to 1053.0442301044923 (2.85%) in 2.8.0/microbenchmark.json
REGRESSION 2.84%: tasks_per_second (THROUGHPUT) regresses from 272.7469880191856 to 265.0085938319898 (2.84%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 2.68%: stage_4_spread (LATENCY) regresses from 0.7217020493267903 to 0.7410596228217868 (2.68%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.60%: client__put_gigabytes (THROUGHPUT) regresses from 0.13283428838343245 to 0.12937928916286368 (2.60%) in 2.8.0/microbenchmark.json
REGRESSION 2.32%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5766.661045557541 to 5632.668588928769 (2.32%) in 2.8.0/microbenchmark.json
REGRESSION 1.97%: client__tasks_and_get_batch (THROUGHPUT) regresses from 1.002041264301031 to 0.9822717209888673 (1.97%) in 2.8.0/microbenchmark.json
REGRESSION 1.55%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 573.4457553242221 to 564.5765462438006 (1.55%) in 2.8.0/microbenchmark.json
REGRESSION 0.83%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.50854610986549 to 5.462989777344944 (0.83%) in 2.8.0/microbenchmark.json
REGRESSION 0.83%: avg_pg_create_time_ms (LATENCY) regresses from 0.9287227387393717 to 0.9363928228234366 (0.83%) in 2.8.0/stress_tests/stress_test_placement_group.json
REGRESSION 0.64%: tasks_per_second (THROUGHPUT) regresses from 443.2356047821634 to 440.39063366884034 (0.64%) in 2.8.0/benchmarks/many_tasks.json

rickyyx · 2023-10-24T18:06:57Z

The dashboard related latency for many_nodes has pretty high variance, not a release blocker I think:

The dashboard latency for many_tasks are expected to increase since we are returning more data (versus before simply a count for tasks dropped)

rickyyx · 2023-10-24T18:08:43Z

Variance for multi_client_put_gigabytes

rickyyx · 2023-10-24T19:06:10Z

10000_get_time is variance.

Signed-off-by: vitsai <victoria@anyscale.com>

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai · 2023-10-25T16:54:04Z

This one is against 2.7.1

REGRESSION 56.47%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 72.519 (56.47%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 23.75%: dashboard_p50_latency_ms (LATENCY) regresses from 35.742 to 44.232 (23.75%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 21.50%: dashboard_p50_latency_ms (LATENCY) regresses from 6.852 to 8.325 (21.50%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 20.44%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 2.6400458812713623 (20.44%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 15.15%: dashboard_p95_latency_ms (LATENCY) regresses from 8149.28 to 9383.733 (15.15%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 14.23%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 7536.924380935448 to 6464.1729246449195 (14.23%) in 2.8.0/microbenchmark.json
REGRESSION 13.61%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 165.464 (13.61%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 11.70%: dashboard_p99_latency_ms (LATENCY) regresses from 14019.625 to 15660.353 (11.70%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 9.63%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 195.069067863 (9.63%) in 2.8.0/scalability/single_node.json
REGRESSION 6.62%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 3.931 (6.62%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 4.58%: client__put_gigabytes (THROUGHPUT) regresses from 0.12778429919203013 to 0.12193333038806775 (4.58%) in 2.8.0/microbenchmark.json
REGRESSION 4.43%: 107374182400_large_object_time (LATENCY) regresses from 30.62209055699998 to 31.97824557199999 (4.43%) in 2.8.0/scalability/single_node.json
REGRESSION 4.00%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 17.983785811000004 (4.00%) in 2.8.0/scalability/single_node.json
REGRESSION 3.72%: tasks_per_second (THROUGHPUT) regresses from 275.25470863736416 to 265.0085938319898 (3.72%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 3.55%: avg_pg_create_time_ms (LATENCY) regresses from 0.917001803304134 to 0.9495785120136944 (3.55%) in 2.8.0/stress_tests/stress_test_placement_group.json
REGRESSION 3.18%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 25.881300527000008 (3.18%) in 2.8.0/scalability/single_node.json
REGRESSION 2.85%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7504145523663499 (2.85%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.58%: actors_per_second (THROUGHPUT) regresses from 738.330085638146 to 719.3018798022547 (2.58%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 2.21%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9672.982187721544 to 9459.166339745669 (2.21%) in 2.8.0/microbenchmark.json
REGRESSION 2.15%: multi_client_tasks_async (THROUGHPUT) regresses from 27850.61204431569 to 27251.785365248128 (2.15%) in 2.8.0/microbenchmark.json
REGRESSION 2.03%: multi_client_put_gigabytes (THROUGHPUT) regresses from 33.620993378733125 to 32.938469438463315 (2.03%) in 2.8.0/microbenchmark.json
REGRESSION 1.80%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 85.80861040199989 to 87.355050575 (1.80%) in 2.8.0/scalability/object_store.json
REGRESSION 1.26%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 61.202564811706544 (1.26%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 0.55%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5813.449193922757 (0.55%) in 2.8.0/microbenchmark.json
REGRESSION 0.27%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.506660769810904 (0.27%) in 2.8.0/microbenchmark.json
REGRESSION 0.10%: avg_iteration_time (LATENCY) regresses from 1.5400808811187745 to 1.5415918231010437 (0.10%) in 2.8.0/stress_tests/stress_test_dead_actors.json

rickyyx · 2023-10-25T17:22:40Z

stage_3_creation_time is high variance (abs value is small but the variance is relatively large)

rickyyx · 2023-10-25T17:30:31Z

I believe there's some regression in 1000000_queued_time after this PR #38771, but not as much as 10%, from the historical range, it's more of ~5% (from 185 -> 195).

The test is testing submitting of 1M tasks from the driver (which overloads the task backend buffer on the driver worker), given other more realistic tests like microbenchmark and stress_test_many_tasks and the variance of this metric, I would probably propose to accept this.

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai · 2023-10-26T17:14:21Z

REGRESSION 72.90%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 80.137 (72.90%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 64.43%: dashboard_p99_latency_ms (LATENCY) regresses from 14019.625 to 23052.355 (64.43%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 27.25%: dashboard_p95_latency_ms (LATENCY) regresses from 8149.28 to 10369.756 (27.25%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 20.13%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 174.969 (20.13%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 15.32%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 205.19758366500002 (15.32%) in 2.8.0/scalability/single_node.json
REGRESSION 13.14%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 68.37787942886352 (13.14%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 11.01%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 4.093 (11.01%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 9.67%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2273.2464988756947 to 2053.5000943392597 (9.67%) in 2.8.0/microbenchmark.json
REGRESSION 8.44%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.009554585474522 to 11.911115119151876 (8.44%) in 2.8.0/microbenchmark.json
REGRESSION 8.18%: placement_group_create/removal (THROUGHPUT) regresses from 997.6322375478999 to 916.0390933731816 (8.18%) in 2.8.0/microbenchmark.json
REGRESSION 8.05%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.077083407402203 (8.05%) in 2.8.0/microbenchmark.json
REGRESSION 7.08%: stage_1_avg_iteration_time (LATENCY) regresses from 23.342886781692506 to 24.99666111469269 (7.08%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 6.84%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7795093658217519 (6.84%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 6.82%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9672.982187721544 to 9013.345072517786 (6.82%) in 2.8.0/microbenchmark.json
REGRESSION 6.35%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8657.20480299259 to 8107.296817082561 (6.35%) in 2.8.0/microbenchmark.json
REGRESSION 5.14%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 26.371442345999995 (5.14%) in 2.8.0/scalability/single_node.json
REGRESSION 5.03%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 18.161911279999998 (5.03%) in 2.8.0/scalability/single_node.json
REGRESSION 4.12%: actors_per_second (THROUGHPUT) regresses from 738.330085638146 to 707.91056858179 (4.12%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 3.61%: tasks_per_second (THROUGHPUT) regresses from 429.0546317074886 to 413.5452010251431 (3.61%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 3.46%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5642.93079417387 (3.46%) in 2.8.0/microbenchmark.json
REGRESSION 3.24%: single_client_tasks_async (THROUGHPUT) regresses from 9563.116886637355 to 9253.350380094456 (3.24%) in 2.8.0/microbenchmark.json
REGRESSION 1.49%: single_client_tasks_sync (THROUGHPUT) regresses from 1177.0860205196755 to 1159.5410057905888 (1.49%) in 2.8.0/microbenchmark.json
REGRESSION 1.38%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 7536.924380935448 to 7432.953617355073 (1.38%) in 2.8.0/microbenchmark.json
REGRESSION 1.29%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 2.2203712463378906 (1.29%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 1.01%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11449.695712792654 to 11334.501893509201 (1.01%) in 2.8.0/microbenchmark.json
REGRESSION 0.14%: n_n_actor_calls_async (THROUGHPUT) regresses from 29270.036133623737 to 29229.518744061534 (0.14%) in 2.8.0/microbenchmark.json

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai · 2023-10-27T19:58:28Z

REGRESSION 123.41%: dashboard_p50_latency_ms (LATENCY) regresses from 35.742 to 79.852 (123.41%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 72.90%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 80.137 (72.90%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 54.82%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 3.393756628036499 (54.82%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 20.13%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 174.969 (20.13%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 11.04%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 197.57540863000003 (11.04%) in 2.8.0/scalability/single_node.json
REGRESSION 11.01%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 4.093 (11.01%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 9.68%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 961.1926184467325 to 868.1661626750067 (9.68%) in 2.8.0/microbenchmark.json
REGRESSION 7.45%: n_n_actor_calls_async (THROUGHPUT) regresses from 29270.036133623737 to 27088.724228974872 (7.45%) in 2.8.0/microbenchmark.json
REGRESSION 7.14%: placement_group_create/removal (THROUGHPUT) regresses from 997.6322375478999 to 926.429178856758 (7.14%) in 2.8.0/microbenchmark.json
REGRESSION 5.88%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7725406967524532 (5.88%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 5.67%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8657.20480299259 to 8166.318822443878 (5.67%) in 2.8.0/microbenchmark.json
REGRESSION 5.25%: actors_per_second (THROUGHPUT) regresses from 738.330085638146 to 699.5362497544337 (5.25%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 4.40%: 1_1_actor_calls_async (THROUGHPUT) regresses from 7456.112509761211 to 7127.879723065892 (4.40%) in 2.8.0/microbenchmark.json
REGRESSION 4.39%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 982.5793246271417 to 939.4658556651513 (4.39%) in 2.8.0/microbenchmark.json
REGRESSION 4.26%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2273.2464988756947 to 2176.3519388689724 (4.26%) in 2.8.0/microbenchmark.json
REGRESSION 3.88%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 24458.207679236828 to 23510.071149146344 (3.88%) in 2.8.0/microbenchmark.json
REGRESSION 3.60%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 62.61348090171814 (3.60%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 3.44%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 17.886716769999992 (3.44%) in 2.8.0/scalability/single_node.json
REGRESSION 3.13%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 4554.146655326606 to 4411.655031271767 (3.13%) in 2.8.0/microbenchmark.json
REGRESSION 2.22%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5715.47419892146 (2.22%) in 2.8.0/microbenchmark.json
REGRESSION 2.22%: dashboard_p50_latency_ms (LATENCY) regresses from 6.852 to 7.004 (2.22%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 1.94%: multi_client_tasks_async (THROUGHPUT) regresses from 27850.61204431569 to 27311.034390113982 (1.94%) in 2.8.0/microbenchmark.json
REGRESSION 1.68%: stage_1_avg_iteration_time (LATENCY) regresses from 23.342886781692506 to 23.734616684913636 (1.68%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 1.61%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.432281070822073 (1.61%) in 2.8.0/microbenchmark.json
REGRESSION 1.35%: client__put_calls (THROUGHPUT) regresses from 821.9975673342932 to 810.9298682126814 (1.35%) in 2.8.0/microbenchmark.json
REGRESSION 0.93%: avg_iteration_time (LATENCY) regresses from 1.5400808811187745 to 1.5543466377258301 (0.93%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 0.61%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 25.235181824999998 (0.61%) in 2.8.0/scalability/single_node.json
REGRESSION 0.17%: tasks_per_second (THROUGHPUT) regresses from 429.0546317074886 to 428.3314400052605 (0.17%) in 2.8.0/benchmarks/many_tasks.json

vitsai · 2023-10-27T19:59:20Z

@rickyyx Still seeing 1000000_queued_time

rickyyx · 2023-10-27T23:30:22Z

I think it would be variance: 177 is on the low end, while 199 is on the high end for this metric:

rickyyx · 2023-10-27T23:32:29Z

rerunning it here:
https://buildkite.com/ray-project/release-tests-branch/builds/2349
https://buildkite.com/ray-project/release-tests-branch/builds/2350
https://buildkite.com/ray-project/release-tests-branch/builds/2351

rickyyx · 2023-10-30T03:53:53Z

So on the same commit:

I think we should ignore this as variance.

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai · 2023-11-01T20:12:58Z

REGRESSION 87.81%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 87.047 (87.81%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 50.45%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 5.547 (50.45%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 29.40%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 188.461 (29.40%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 13.80%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 68.77781887054444 (13.80%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 9.61%: single_client_tasks_async (THROUGHPUT) regresses from 9563.116886637355 to 8643.833466025399 (9.61%) in 2.8.0/microbenchmark.json
REGRESSION 8.99%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 193.92261782800003 (8.99%) in 2.8.0/scalability/single_node.json
REGRESSION 8.80%: client__tasks_and_get_batch (THROUGHPUT) regresses from 0.9574694946141461 to 0.8732387089551705 (8.80%) in 2.8.0/microbenchmark.json
REGRESSION 8.00%: stage_1_avg_iteration_time (LATENCY) regresses from 23.342886781692506 to 25.210959935188292 (8.00%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 7.44%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.110630867571745 (7.44%) in 2.8.0/microbenchmark.json
REGRESSION 7.21%: dashboard_p50_latency_ms (LATENCY) regresses from 6.852 to 7.346 (7.21%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 7.17%: placement_group_create/removal (THROUGHPUT) regresses from 997.6322375478999 to 926.0840791839338 (7.17%) in 2.8.0/microbenchmark.json
REGRESSION 5.85%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7723065878019925 (5.85%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 5.34%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 26.422835183000018 (5.34%) in 2.8.0/scalability/single_node.json
REGRESSION 4.57%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 9.129595770052905 to 8.7124898510668 (4.57%) in 2.8.0/microbenchmark.json
REGRESSION 3.93%: dashboard_p99_latency_ms (LATENCY) regresses from 14019.625 to 14570.201 (3.93%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 3.13%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 2.260662794113159 (3.13%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 3.07%: avg_iteration_time (LATENCY) regresses from 1.5400808811187745 to 1.5873150753974914 (3.07%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 2.95%: client__put_gigabytes (THROUGHPUT) regresses from 0.12778429919203013 to 0.12401864230452364 (2.95%) in 2.8.0/microbenchmark.json
REGRESSION 2.62%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2273.2464988756947 to 2213.6033025230176 (2.62%) in 2.8.0/microbenchmark.json
REGRESSION 2.53%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5697.447666436941 (2.53%) in 2.8.0/microbenchmark.json
REGRESSION 2.36%: stage_3_time (LATENCY) regresses from 2875.2445271015167 to 2943.001654624939 (2.36%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.29%: multi_client_tasks_async (THROUGHPUT) regresses from 27850.61204431569 to 27211.51041454346 (2.29%) in 2.8.0/microbenchmark.json
REGRESSION 2.13%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 17.66019733799999 (2.13%) in 2.8.0/scalability/single_node.json
REGRESSION 1.63%: 107374182400_large_object_time (LATENCY) regresses from 30.62209055699998 to 31.122242091999965 (1.63%) in 2.8.0/scalability/single_node.json
REGRESSION 1.31%: single_client_tasks_sync (THROUGHPUT) regresses from 1177.0860205196755 to 1161.670131632561 (1.31%) in 2.8.0/microbenchmark.json
REGRESSION 1.30%: tasks_per_second (THROUGHPUT) regresses from 275.25470863736416 to 271.6776615598984 (1.30%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 0.97%: multi_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 12344.39546658664 to 12224.373147431208 (0.97%) in 2.8.0/microbenchmark.json
REGRESSION 0.94%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9672.982187721544 to 9581.728569086026 (0.94%) in 2.8.0/microbenchmark.json
REGRESSION 0.69%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 24458.207679236828 to 24290.541801601616 (0.69%) in 2.8.0/microbenchmark.json
REGRESSION 0.64%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8657.20480299259 to 8601.993472120319 (0.64%) in 2.8.0/microbenchmark.json
REGRESSION 0.30%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11449.695712792654 to 11415.752622212967 (0.30%) in 2.8.0/microbenchmark.json

rickyyx · 2023-11-01T22:09:00Z

single_client_tasks_async

This seems regression since we had a run on 2.8.0 with non-regressed perf just before: https://buildkite.com/ray-project/release-tests-branch/builds/2365#018b843e-61ff-4d6d-880e-0a10b6b122fa

rickyyx · 2023-11-01T22:10:41Z

stage_2_avg_iteration_time I think there's +-5 seconds regression variance.

And others metrics look like variance.

rkooo567

LGTM in general

There are 4 suspicious values. @rickyyx have you checked if those 4 are within the error range? I think it is probably too late to fix it now, but we can create an issue.

Also, @vitsai can you post the diff result (there's a script) and add a comment to this PR?

rkooo567 · 2023-11-02T14:37:40Z

release/release_logs/2.8.0/microbenchmark.json

+        119.28968447369434
+    ],
+    "1_n_actor_calls_async": [
+        9581.728569086026,


rkooo567 · 2023-11-02T14:39:15Z

release/release_logs/2.8.0/microbenchmark.json

+        {
+            "perf_metric_name": "single_client_tasks_async",
+            "perf_metric_type": "THROUGHPUT",
+            "perf_metric_value": 8643.833466025399


this is pretty low

rkooo567 · 2023-11-02T14:41:03Z

release/release_logs/2.8.0/scalability/object_store.json

@@ -0,0 +1,13 @@
+{
+    "broadcast_time": 82.940892212,


this has a slight regression

rkooo567 · 2023-11-02T14:42:08Z

release/release_logs/2.8.0/stress_tests/stress_test_many_tasks.json

+        {
+            "perf_metric_name": "stage_2_avg_iteration_time",
+            "perf_metric_type": "LATENCY",
+            "perf_metric_value": 68.77781887054444


anyscalesam · 2023-11-21T21:29:46Z

is this still relevant and a release-blocker for ray29?

rickyyx · 2023-11-21T22:07:19Z

@vitsai should i merge this?

Signed-off-by: vitsai <victoria@anyscale.com> Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai assigned jjyao and can-anyscale Oct 23, 2023

vitsai added P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release labels Oct 23, 2023

rickyyx self-assigned this Oct 23, 2023

vitsai added 2 commits October 25, 2023 16:53

release perf logs

79359e9

Signed-off-by: vitsai <victoria@anyscale.com>

updated

5734998

Signed-off-by: vitsai <victoria@anyscale.com>

new numbers

7b97d9e

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai force-pushed the perf-logs branch from ee98f1f to 7b97d9e Compare October 25, 2023 16:53

This was referenced Oct 25, 2023

[core] stress_test_many_tasks regression #40608

Closed

[core] microbenchmark regression #40606

Closed

today

f1b2b65

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

vitsai mentioned this pull request Oct 26, 2023

[do-not-merge] 2.8 perf regression in diff form #40572

Closed

8 tasks

friday

6220160

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

final

901b6d6

Signed-off-by: vitsai <vitsai@cs.stanford.edu>

rickyyx approved these changes Nov 1, 2023

View reviewed changes

rkooo567 approved these changes Nov 2, 2023

View reviewed changes

anyscalesam added the core Issues that should be addressed in Ray Core label Nov 21, 2023

jjyao merged commit d6184c5 into ray-project:master Nov 22, 2023
14 of 15 checks passed

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023

[release] 2.8 release perf logs (ray-project#40571)

311dffb

Signed-off-by: vitsai <victoria@anyscale.com> Signed-off-by: vitsai <vitsai@cs.stanford.edu>

[release] 2.8 release perf logs #40571

[release] 2.8 release perf logs #40571

Conversation

vitsai commented Oct 23, 2023

Why are these changes needed?

Related issue number

Checks

can-anyscale commented Oct 23, 2023

vitsai commented Oct 23, 2023 • edited Loading

vitsai commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023 • edited Loading

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

vitsai commented Oct 23, 2023

rickyyx commented Oct 23, 2023

rickyyx commented Oct 23, 2023

vitsai commented Oct 23, 2023

vitsai commented Oct 24, 2023

rickyyx commented Oct 24, 2023

rickyyx commented Oct 24, 2023

rickyyx commented Oct 24, 2023

vitsai commented Oct 25, 2023

rickyyx commented Oct 25, 2023

rickyyx commented Oct 25, 2023

vitsai commented Oct 26, 2023

vitsai commented Oct 27, 2023

vitsai commented Oct 27, 2023 • edited Loading

rickyyx commented Oct 27, 2023 • edited Loading

rickyyx commented Oct 27, 2023

rickyyx commented Oct 30, 2023

vitsai commented Nov 1, 2023

rickyyx commented Nov 1, 2023 • edited Loading

rickyyx commented Nov 1, 2023

rkooo567 left a comment • edited Loading

Choose a reason for hiding this comment

rkooo567 Nov 2, 2023

Choose a reason for hiding this comment

rkooo567 Nov 2, 2023

Choose a reason for hiding this comment

rkooo567 Nov 2, 2023

Choose a reason for hiding this comment

rkooo567 Nov 2, 2023

Choose a reason for hiding this comment

anyscalesam commented Nov 21, 2023

rickyyx commented Nov 21, 2023

vitsai commented Oct 23, 2023 •

edited

Loading

rickyyx commented Oct 23, 2023 •

edited

Loading

vitsai commented Oct 27, 2023 •

edited

Loading

rickyyx commented Oct 27, 2023 •

edited

Loading

rickyyx commented Nov 1, 2023 •

edited

Loading

rkooo567 left a comment •

edited

Loading