Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release] 2.8 release perf logs #40571

Merged
merged 6 commits into from
Nov 22, 2023
Merged

[release] 2.8 release perf logs #40571

merged 6 commits into from
Nov 22, 2023

Conversation

vitsai
Copy link
Contributor

@vitsai vitsai commented Oct 23, 2023

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@vitsai vitsai added P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release labels Oct 23, 2023
@can-anyscale
Copy link
Collaborator

w00h00 thank you, I'll leave this up for your team to review then. Also they normally try something like this #29615 to make it easier to review.

@vitsai
Copy link
Contributor Author

vitsai commented Oct 23, 2023

REGRESSION 39.88%: dashboard_p99_latency_ms (LATENCY) regresses from 13941.91 to 19502.436 (39.88%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 37.76%: dashboard_p95_latency_ms (LATENCY) regresses from 6729.761 to 9271.209 (37.76%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 22.64%: dashboard_p50_latency_ms (LATENCY) regresses from 5.534 to 6.787 (22.64%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 21.37%: multi_client_put_gigabytes (THROUGHPUT) regresses from 38.605668097256924 to 30.354050606212375 (21.37%) in 2.8.0/microbenchmark.json
REGRESSION 14.94%: 1000000_queued_time (LATENCY) regresses from 181.82263824499995 to 208.991110036 (14.94%) in 2.8.0/scalability/single_node.json
REGRESSION 13.00%: avg_iteration_time (LATENCY) regresses from 1.4622855401039123 to 1.6523929166793823 (13.00%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 12.51%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 573.4457553242221 to 501.7198350460272 (12.51%) in 2.8.0/microbenchmark.json
REGRESSION 12.48%: 1_n_actor_calls_async (THROUGHPUT) regresses from 10133.72696574923 to 8869.51837285407 (12.48%) in 2.8.0/microbenchmark.json
REGRESSION 9.56%: multi_client_tasks_async (THROUGHPUT) regresses from 28423.644858766176 to 25705.1703030958 (9.56%) in 2.8.0/microbenchmark.json
REGRESSION 9.52%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 9124.377528222414 to 8255.28161190285 (9.52%) in 2.8.0/microbenchmark.json
REGRESSION 9.39%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1080.2139341634759 to 978.7677324282791 (9.39%) in 2.8.0/microbenchmark.json
REGRESSION 9.32%: 10000_get_time (LATENCY) regresses from 23.55671212599998 to 25.751547934 (9.32%) in 2.8.0/scalability/single_node.json
REGRESSION 9.27%: stage_2_avg_iteration_time (LATENCY) regresses from 58.160910558700564 to 63.549897527694704 (9.27%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 9.03%: dashboard_p50_latency_ms (LATENCY) regresses from 3.465 to 3.778 (9.03%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 8.65%: single_client_tasks_async (THROUGHPUT) regresses from 10739.407361558973 to 9810.522775386338 (8.65%) in 2.8.0/microbenchmark.json
REGRESSION 7.01%: client__tasks_and_get_batch (THROUGHPUT) regresses from 1.002041264301031 to 0.9317720482301839 (7.01%) in 2.8.0/microbenchmark.json
REGRESSION 6.86%: stage_1_avg_iteration_time (LATENCY) regresses from 23.305240750312805 to 24.904260087013245 (6.86%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 6.63%: single_client_tasks_sync (THROUGHPUT) regresses from 1311.812164358857 to 1224.7771019170784 (6.63%) in 2.8.0/microbenchmark.json
REGRESSION 6.47%: 10000_args_time (LATENCY) regresses from 16.89121779300001 to 17.983383886000013 (6.47%) in 2.8.0/scalability/single_node.json
REGRESSION 6.40%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 14.755162462843568 to 13.810261182479115 (6.40%) in 2.8.0/microbenchmark.json
REGRESSION 5.08%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1083.9300708022135 to 1028.839383358896 (5.08%) in 2.8.0/microbenchmark.json
REGRESSION 4.95%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 25688.484755543966 to 24418.02409630373 (4.95%) in 2.8.0/microbenchmark.json
REGRESSION 4.77%: actors_per_second (THROUGHPUT) regresses from 748.5322140167257 to 712.822673976586 (4.77%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 4.54%: n_n_actor_calls_async (THROUGHPUT) regresses from 30847.92669705198 to 29447.495631692484 (4.54%) in 2.8.0/microbenchmark.json
REGRESSION 3.68%: 3000_returns_time (LATENCY) regresses from 5.6602293089999876 to 5.868746854999998 (3.68%) in 2.8.0/scalability/single_node.json
REGRESSION 3.49%: client__put_calls (THROUGHPUT) regresses from 857.6367908455961 to 827.6636203329824 (3.49%) in 2.8.0/microbenchmark.json
REGRESSION 2.93%: 1_1_actor_calls_async (THROUGHPUT) regresses from 7615.355914488919 to 7392.382505877325 (2.93%) in 2.8.0/microbenchmark.json
REGRESSION 2.75%: stage_4_spread (LATENCY) regresses from 0.7217020493267903 to 0.7415504343465557 (2.75%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.40%: stage_3_time (LATENCY) regresses from 2802.1650245189667 to 2869.4522173404694 (2.40%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.20%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5766.661045557541 to 5639.599660122037 (2.20%) in 2.8.0/microbenchmark.json
REGRESSION 1.54%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 3074.0790016310475 to 3026.6872241733467 (1.54%) in 2.8.0/microbenchmark.json
REGRESSION 1.25%: client__put_gigabytes (THROUGHPUT) regresses from 0.13283428838343245 to 0.13117919950925 (1.25%) in 2.8.0/microbenchmark.json
REGRESSION 0.52%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11411.245745812425 to 11351.922193872259 (0.52%) in 2.8.0/microbenchmark.json
REGRESSION 0.36%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2255.614293958201 to 2247.4937092440123 (0.36%) in 2.8.0/microbenchmark.json

@vitsai
Copy link
Contributor Author

vitsai commented Oct 23, 2023

Diff here: #40572

@rickyyx rickyyx self-assigned this Oct 23, 2023
@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

multi_client_put_gigabytes
should be due to flakiness:
image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

1000000_queued_time probably does have a regression, but it seems less serious than 20%, seems to be 185 -> 200, ~8%

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

avg_iteration_time seems to be flaky - and looks like not in the master branch, could be infra errors?
image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

client__1_1_actor_calls_sync is not regression

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

1_n_actor_calls_async is probably rpc related, but it never recovers. There seems to be ~10% regression
image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

multi_client_tasks_async is similar to 1_n_actor_calls_async it never recovers from the dip.

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

1_n_async_actor_calls_async looks like regression.

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

10000_get_time is not regression - just flaky.

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

stage_2_avg_iteration_time
This seems to be regression:
image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

single_client_tasks_async seems to be ok for now
image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

stage_1_avg_iteration_time seems there's some regression, even though not huge. But looks obvious.
image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

single_client_tasks_sync seems real even though not significant.

image

@vitsai
Copy link
Contributor Author

vitsai commented Oct 23, 2023

For avg_iteration_time, the value of 1.65 on release branch is out of distribution though

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

dashboard_p99_latency_ms and other regressions are related to the GCS task backend. It's expected since we do have more data to return while it was all dropped from the previous code.

@rickyyx
Copy link
Contributor

rickyyx commented Oct 23, 2023

For avg_iteration_time, the value of 1.65 on release branch is out of distribution though

Yeah, but given the release branch is just a prefix of the master branch, I belive it's more of flakiness. Could we rerun it?

@vitsai
Copy link
Contributor Author

vitsai commented Oct 23, 2023

Will run again after cherry picks, should give more signal on those edge of distribution values

@vitsai
Copy link
Contributor Author

vitsai commented Oct 24, 2023

The new ones from today. Still waiting on @GeneDer for 2.7.1 release logs to compare against (instead of 2.7.0)

REGRESSION 52.83%: dashboard_p95_latency_ms (LATENCY) regresses from 47.45 to 72.519 (52.83%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 51.28%: dashboard_p95_latency_ms (LATENCY) regresses from 6729.761 to 10180.702 (51.28%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 50.84%: dashboard_p99_latency_ms (LATENCY) regresses from 13941.91 to 21030.085 (50.84%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 20.19%: 1000000_queued_time (LATENCY) regresses from 181.82263824499995 to 218.523912088 (20.19%) in 2.8.0/scalability/single_node.json
REGRESSION 19.50%: multi_client_put_gigabytes (THROUGHPUT) regresses from 38.605668097256924 to 31.07867911026417 (19.50%) in 2.8.0/microbenchmark.json
REGRESSION 16.66%: dashboard_p50_latency_ms (LATENCY) regresses from 5.534 to 6.456 (16.66%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 15.31%: dashboard_p99_latency_ms (LATENCY) regresses from 143.498 to 165.464 (15.31%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 13.45%: dashboard_p50_latency_ms (LATENCY) regresses from 3.465 to 3.931 (13.45%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 12.98%: stage_2_avg_iteration_time (LATENCY) regresses from 58.160910558700564 to 65.70907316207885 (12.98%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 10.61%: 10000_get_time (LATENCY) regresses from 23.55671212599998 to 26.055165431999995 (10.61%) in 2.8.0/scalability/single_node.json
REGRESSION 10.54%: single_client_tasks_async (THROUGHPUT) regresses from 10739.407361558973 to 9607.186982064028 (10.54%) in 2.8.0/microbenchmark.json
REGRESSION 10.34%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 14.755162462843568 to 13.228837029499449 (10.34%) in 2.8.0/microbenchmark.json
REGRESSION 10.23%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 70.13534577099995 to 77.31193332200007 (10.23%) in 2.8.0/scalability/object_store.json
REGRESSION 9.35%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 9124.377528222414 to 8271.416009416063 (9.35%) in 2.8.0/microbenchmark.json
REGRESSION 8.75%: multi_client_tasks_async (THROUGHPUT) regresses from 28423.644858766176 to 25935.554390623118 (8.75%) in 2.8.0/microbenchmark.json
REGRESSION 8.53%: single_client_tasks_sync (THROUGHPUT) regresses from 1311.812164358857 to 1199.8831112257556 (8.53%) in 2.8.0/microbenchmark.json
REGRESSION 8.22%: stage_1_avg_iteration_time (LATENCY) regresses from 23.305240750312805 to 25.220864820480347 (8.22%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 7.63%: 1_n_actor_calls_async (THROUGHPUT) regresses from 10133.72696574923 to 9360.803502103265 (7.63%) in 2.8.0/microbenchmark.json
REGRESSION 7.58%: n_n_actor_calls_async (THROUGHPUT) regresses from 30847.92669705198 to 28510.050783675328 (7.58%) in 2.8.0/microbenchmark.json
REGRESSION 5.89%: stage_3_time (LATENCY) regresses from 2802.1650245189667 to 2967.23273229599 (5.89%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 4.69%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 25688.484755543966 to 24484.174826322916 (4.69%) in 2.8.0/microbenchmark.json
REGRESSION 4.23%: 3000_returns_time (LATENCY) regresses from 5.6602293089999876 to 5.899374877999989 (4.23%) in 2.8.0/scalability/single_node.json
REGRESSION 3.91%: actors_per_second (THROUGHPUT) regresses from 748.5322140167257 to 719.3018798022547 (3.91%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 3.63%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1080.2139341634759 to 1040.99657229968 (3.63%) in 2.8.0/microbenchmark.json
REGRESSION 3.46%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 3074.0790016310475 to 2967.685662012421 (3.46%) in 2.8.0/microbenchmark.json
REGRESSION 3.40%: avg_iteration_time (LATENCY) regresses from 1.4622855401039123 to 1.5120103502273559 (3.40%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 3.10%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 4745.83263563276 to 4598.670938618477 (3.10%) in 2.8.0/microbenchmark.json
REGRESSION 3.03%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 9.369535279594958 to 9.086079282777337 (3.03%) in 2.8.0/microbenchmark.json
REGRESSION 2.91%: 10000_args_time (LATENCY) regresses from 16.89121779300001 to 17.381954480000005 (2.91%) in 2.8.0/scalability/single_node.json
REGRESSION 2.85%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1083.9300708022135 to 1053.0442301044923 (2.85%) in 2.8.0/microbenchmark.json
REGRESSION 2.84%: tasks_per_second (THROUGHPUT) regresses from 272.7469880191856 to 265.0085938319898 (2.84%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 2.68%: stage_4_spread (LATENCY) regresses from 0.7217020493267903 to 0.7410596228217868 (2.68%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.60%: client__put_gigabytes (THROUGHPUT) regresses from 0.13283428838343245 to 0.12937928916286368 (2.60%) in 2.8.0/microbenchmark.json
REGRESSION 2.32%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5766.661045557541 to 5632.668588928769 (2.32%) in 2.8.0/microbenchmark.json
REGRESSION 1.97%: client__tasks_and_get_batch (THROUGHPUT) regresses from 1.002041264301031 to 0.9822717209888673 (1.97%) in 2.8.0/microbenchmark.json
REGRESSION 1.55%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 573.4457553242221 to 564.5765462438006 (1.55%) in 2.8.0/microbenchmark.json
REGRESSION 0.83%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.50854610986549 to 5.462989777344944 (0.83%) in 2.8.0/microbenchmark.json
REGRESSION 0.83%: avg_pg_create_time_ms (LATENCY) regresses from 0.9287227387393717 to 0.9363928228234366 (0.83%) in 2.8.0/stress_tests/stress_test_placement_group.json
REGRESSION 0.64%: tasks_per_second (THROUGHPUT) regresses from 443.2356047821634 to 440.39063366884034 (0.64%) in 2.8.0/benchmarks/many_tasks.json

@rickyyx
Copy link
Contributor

rickyyx commented Oct 24, 2023

The dashboard related latency for many_nodes has pretty high variance, not a release blocker I think:

image image

The dashboard latency for many_tasks are expected to increase since we are returning more data (versus before simply a count for tasks dropped)

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 24, 2023

Variance for multi_client_put_gigabytes

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 24, 2023

10000_get_time is variance.

image

Signed-off-by: vitsai <victoria@anyscale.com>
Signed-off-by: vitsai <victoria@anyscale.com>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
@vitsai
Copy link
Contributor Author

vitsai commented Oct 25, 2023

This one is against 2.7.1

REGRESSION 56.47%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 72.519 (56.47%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 23.75%: dashboard_p50_latency_ms (LATENCY) regresses from 35.742 to 44.232 (23.75%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 21.50%: dashboard_p50_latency_ms (LATENCY) regresses from 6.852 to 8.325 (21.50%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 20.44%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 2.6400458812713623 (20.44%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 15.15%: dashboard_p95_latency_ms (LATENCY) regresses from 8149.28 to 9383.733 (15.15%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 14.23%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 7536.924380935448 to 6464.1729246449195 (14.23%) in 2.8.0/microbenchmark.json
REGRESSION 13.61%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 165.464 (13.61%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 11.70%: dashboard_p99_latency_ms (LATENCY) regresses from 14019.625 to 15660.353 (11.70%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 9.63%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 195.069067863 (9.63%) in 2.8.0/scalability/single_node.json
REGRESSION 6.62%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 3.931 (6.62%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 4.58%: client__put_gigabytes (THROUGHPUT) regresses from 0.12778429919203013 to 0.12193333038806775 (4.58%) in 2.8.0/microbenchmark.json
REGRESSION 4.43%: 107374182400_large_object_time (LATENCY) regresses from 30.62209055699998 to 31.97824557199999 (4.43%) in 2.8.0/scalability/single_node.json
REGRESSION 4.00%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 17.983785811000004 (4.00%) in 2.8.0/scalability/single_node.json
REGRESSION 3.72%: tasks_per_second (THROUGHPUT) regresses from 275.25470863736416 to 265.0085938319898 (3.72%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 3.55%: avg_pg_create_time_ms (LATENCY) regresses from 0.917001803304134 to 0.9495785120136944 (3.55%) in 2.8.0/stress_tests/stress_test_placement_group.json
REGRESSION 3.18%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 25.881300527000008 (3.18%) in 2.8.0/scalability/single_node.json
REGRESSION 2.85%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7504145523663499 (2.85%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.58%: actors_per_second (THROUGHPUT) regresses from 738.330085638146 to 719.3018798022547 (2.58%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 2.21%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9672.982187721544 to 9459.166339745669 (2.21%) in 2.8.0/microbenchmark.json
REGRESSION 2.15%: multi_client_tasks_async (THROUGHPUT) regresses from 27850.61204431569 to 27251.785365248128 (2.15%) in 2.8.0/microbenchmark.json
REGRESSION 2.03%: multi_client_put_gigabytes (THROUGHPUT) regresses from 33.620993378733125 to 32.938469438463315 (2.03%) in 2.8.0/microbenchmark.json
REGRESSION 1.80%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 85.80861040199989 to 87.355050575 (1.80%) in 2.8.0/scalability/object_store.json
REGRESSION 1.26%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 61.202564811706544 (1.26%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 0.55%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5813.449193922757 (0.55%) in 2.8.0/microbenchmark.json
REGRESSION 0.27%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.506660769810904 (0.27%) in 2.8.0/microbenchmark.json
REGRESSION 0.10%: avg_iteration_time (LATENCY) regresses from 1.5400808811187745 to 1.5415918231010437 (0.10%) in 2.8.0/stress_tests/stress_test_dead_actors.json

@rickyyx
Copy link
Contributor

rickyyx commented Oct 25, 2023

stage_3_creation_time is high variance (abs value is small but the variance is relatively large)

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 25, 2023

I believe there's some regression in 1000000_queued_time after this PR #38771, but not as much as 10%, from the historical range, it's more of ~5% (from 185 -> 195).

The test is testing submitting of 1M tasks from the driver (which overloads the task backend buffer on the driver worker), given other more realistic tests like microbenchmark and stress_test_many_tasks and the variance of this metric, I would probably propose to accept this.

image

Signed-off-by: vitsai <vitsai@cs.stanford.edu>
@vitsai
Copy link
Contributor Author

vitsai commented Oct 26, 2023

REGRESSION 72.90%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 80.137 (72.90%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 64.43%: dashboard_p99_latency_ms (LATENCY) regresses from 14019.625 to 23052.355 (64.43%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 27.25%: dashboard_p95_latency_ms (LATENCY) regresses from 8149.28 to 10369.756 (27.25%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 20.13%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 174.969 (20.13%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 15.32%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 205.19758366500002 (15.32%) in 2.8.0/scalability/single_node.json
REGRESSION 13.14%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 68.37787942886352 (13.14%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 11.01%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 4.093 (11.01%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 9.67%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2273.2464988756947 to 2053.5000943392597 (9.67%) in 2.8.0/microbenchmark.json
REGRESSION 8.44%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.009554585474522 to 11.911115119151876 (8.44%) in 2.8.0/microbenchmark.json
REGRESSION 8.18%: placement_group_create/removal (THROUGHPUT) regresses from 997.6322375478999 to 916.0390933731816 (8.18%) in 2.8.0/microbenchmark.json
REGRESSION 8.05%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.077083407402203 (8.05%) in 2.8.0/microbenchmark.json
REGRESSION 7.08%: stage_1_avg_iteration_time (LATENCY) regresses from 23.342886781692506 to 24.99666111469269 (7.08%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 6.84%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7795093658217519 (6.84%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 6.82%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9672.982187721544 to 9013.345072517786 (6.82%) in 2.8.0/microbenchmark.json
REGRESSION 6.35%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8657.20480299259 to 8107.296817082561 (6.35%) in 2.8.0/microbenchmark.json
REGRESSION 5.14%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 26.371442345999995 (5.14%) in 2.8.0/scalability/single_node.json
REGRESSION 5.03%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 18.161911279999998 (5.03%) in 2.8.0/scalability/single_node.json
REGRESSION 4.12%: actors_per_second (THROUGHPUT) regresses from 738.330085638146 to 707.91056858179 (4.12%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 3.61%: tasks_per_second (THROUGHPUT) regresses from 429.0546317074886 to 413.5452010251431 (3.61%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 3.46%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5642.93079417387 (3.46%) in 2.8.0/microbenchmark.json
REGRESSION 3.24%: single_client_tasks_async (THROUGHPUT) regresses from 9563.116886637355 to 9253.350380094456 (3.24%) in 2.8.0/microbenchmark.json
REGRESSION 1.49%: single_client_tasks_sync (THROUGHPUT) regresses from 1177.0860205196755 to 1159.5410057905888 (1.49%) in 2.8.0/microbenchmark.json
REGRESSION 1.38%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 7536.924380935448 to 7432.953617355073 (1.38%) in 2.8.0/microbenchmark.json
REGRESSION 1.29%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 2.2203712463378906 (1.29%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 1.01%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11449.695712792654 to 11334.501893509201 (1.01%) in 2.8.0/microbenchmark.json
REGRESSION 0.14%: n_n_actor_calls_async (THROUGHPUT) regresses from 29270.036133623737 to 29229.518744061534 (0.14%) in 2.8.0/microbenchmark.json

Signed-off-by: vitsai <vitsai@cs.stanford.edu>
@vitsai
Copy link
Contributor Author

vitsai commented Oct 27, 2023

REGRESSION 123.41%: dashboard_p50_latency_ms (LATENCY) regresses from 35.742 to 79.852 (123.41%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 72.90%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 80.137 (72.90%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 54.82%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 3.393756628036499 (54.82%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 20.13%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 174.969 (20.13%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 11.04%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 197.57540863000003 (11.04%) in 2.8.0/scalability/single_node.json
REGRESSION 11.01%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 4.093 (11.01%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 9.68%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 961.1926184467325 to 868.1661626750067 (9.68%) in 2.8.0/microbenchmark.json
REGRESSION 7.45%: n_n_actor_calls_async (THROUGHPUT) regresses from 29270.036133623737 to 27088.724228974872 (7.45%) in 2.8.0/microbenchmark.json
REGRESSION 7.14%: placement_group_create/removal (THROUGHPUT) regresses from 997.6322375478999 to 926.429178856758 (7.14%) in 2.8.0/microbenchmark.json
REGRESSION 5.88%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7725406967524532 (5.88%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 5.67%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8657.20480299259 to 8166.318822443878 (5.67%) in 2.8.0/microbenchmark.json
REGRESSION 5.25%: actors_per_second (THROUGHPUT) regresses from 738.330085638146 to 699.5362497544337 (5.25%) in 2.8.0/benchmarks/many_actors.json
REGRESSION 4.40%: 1_1_actor_calls_async (THROUGHPUT) regresses from 7456.112509761211 to 7127.879723065892 (4.40%) in 2.8.0/microbenchmark.json
REGRESSION 4.39%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 982.5793246271417 to 939.4658556651513 (4.39%) in 2.8.0/microbenchmark.json
REGRESSION 4.26%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2273.2464988756947 to 2176.3519388689724 (4.26%) in 2.8.0/microbenchmark.json
REGRESSION 3.88%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 24458.207679236828 to 23510.071149146344 (3.88%) in 2.8.0/microbenchmark.json
REGRESSION 3.60%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 62.61348090171814 (3.60%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 3.44%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 17.886716769999992 (3.44%) in 2.8.0/scalability/single_node.json
REGRESSION 3.13%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 4554.146655326606 to 4411.655031271767 (3.13%) in 2.8.0/microbenchmark.json
REGRESSION 2.22%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5715.47419892146 (2.22%) in 2.8.0/microbenchmark.json
REGRESSION 2.22%: dashboard_p50_latency_ms (LATENCY) regresses from 6.852 to 7.004 (2.22%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 1.94%: multi_client_tasks_async (THROUGHPUT) regresses from 27850.61204431569 to 27311.034390113982 (1.94%) in 2.8.0/microbenchmark.json
REGRESSION 1.68%: stage_1_avg_iteration_time (LATENCY) regresses from 23.342886781692506 to 23.734616684913636 (1.68%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 1.61%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.432281070822073 (1.61%) in 2.8.0/microbenchmark.json
REGRESSION 1.35%: client__put_calls (THROUGHPUT) regresses from 821.9975673342932 to 810.9298682126814 (1.35%) in 2.8.0/microbenchmark.json
REGRESSION 0.93%: avg_iteration_time (LATENCY) regresses from 1.5400808811187745 to 1.5543466377258301 (0.93%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 0.61%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 25.235181824999998 (0.61%) in 2.8.0/scalability/single_node.json
REGRESSION 0.17%: tasks_per_second (THROUGHPUT) regresses from 429.0546317074886 to 428.3314400052605 (0.17%) in 2.8.0/benchmarks/many_tasks.json

@vitsai
Copy link
Contributor Author

vitsai commented Oct 27, 2023

@rickyyx Still seeing 1000000_queued_time

@rickyyx
Copy link
Contributor

rickyyx commented Oct 27, 2023

I think it would be variance: 177 is on the low end, while 199 is on the high end for this metric:
image

image

@rickyyx
Copy link
Contributor

rickyyx commented Oct 30, 2023

Signed-off-by: vitsai <vitsai@cs.stanford.edu>
@vitsai
Copy link
Contributor Author

vitsai commented Nov 1, 2023

REGRESSION 87.81%: dashboard_p95_latency_ms (LATENCY) regresses from 46.348 to 87.047 (87.81%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 50.45%: dashboard_p50_latency_ms (LATENCY) regresses from 3.687 to 5.547 (50.45%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 29.40%: dashboard_p99_latency_ms (LATENCY) regresses from 145.645 to 188.461 (29.40%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 13.80%: stage_2_avg_iteration_time (LATENCY) regresses from 60.438395738601685 to 68.77781887054444 (13.80%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 9.61%: single_client_tasks_async (THROUGHPUT) regresses from 9563.116886637355 to 8643.833466025399 (9.61%) in 2.8.0/microbenchmark.json
REGRESSION 8.99%: 1000000_queued_time (LATENCY) regresses from 177.93132558000002 to 193.92261782800003 (8.99%) in 2.8.0/scalability/single_node.json
REGRESSION 8.80%: client__tasks_and_get_batch (THROUGHPUT) regresses from 0.9574694946141461 to 0.8732387089551705 (8.80%) in 2.8.0/microbenchmark.json
REGRESSION 8.00%: stage_1_avg_iteration_time (LATENCY) regresses from 23.342886781692506 to 25.210959935188292 (8.00%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 7.44%: single_client_wait_1k_refs (THROUGHPUT) regresses from 5.52141006441801 to 5.110630867571745 (7.44%) in 2.8.0/microbenchmark.json
REGRESSION 7.21%: dashboard_p50_latency_ms (LATENCY) regresses from 6.852 to 7.346 (7.21%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 7.17%: placement_group_create/removal (THROUGHPUT) regresses from 997.6322375478999 to 926.0840791839338 (7.17%) in 2.8.0/microbenchmark.json
REGRESSION 5.85%: stage_4_spread (LATENCY) regresses from 0.7296295246039273 to 0.7723065878019925 (5.85%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 5.34%: 10000_get_time (LATENCY) regresses from 25.08259386100002 to 26.422835183000018 (5.34%) in 2.8.0/scalability/single_node.json
REGRESSION 4.57%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 9.129595770052905 to 8.7124898510668 (4.57%) in 2.8.0/microbenchmark.json
REGRESSION 3.93%: dashboard_p99_latency_ms (LATENCY) regresses from 14019.625 to 14570.201 (3.93%) in 2.8.0/benchmarks/many_tasks.json
REGRESSION 3.13%: stage_3_creation_time (LATENCY) regresses from 2.1919972896575928 to 2.260662794113159 (3.13%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 3.07%: avg_iteration_time (LATENCY) regresses from 1.5400808811187745 to 1.5873150753974914 (3.07%) in 2.8.0/stress_tests/stress_test_dead_actors.json
REGRESSION 2.95%: client__put_gigabytes (THROUGHPUT) regresses from 0.12778429919203013 to 0.12401864230452364 (2.95%) in 2.8.0/microbenchmark.json
REGRESSION 2.62%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2273.2464988756947 to 2213.6033025230176 (2.62%) in 2.8.0/microbenchmark.json
REGRESSION 2.53%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5845.356422909845 to 5697.447666436941 (2.53%) in 2.8.0/microbenchmark.json
REGRESSION 2.36%: stage_3_time (LATENCY) regresses from 2875.2445271015167 to 2943.001654624939 (2.36%) in 2.8.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.29%: multi_client_tasks_async (THROUGHPUT) regresses from 27850.61204431569 to 27211.51041454346 (2.29%) in 2.8.0/microbenchmark.json
REGRESSION 2.13%: 10000_args_time (LATENCY) regresses from 17.291354780999995 to 17.66019733799999 (2.13%) in 2.8.0/scalability/single_node.json
REGRESSION 1.63%: 107374182400_large_object_time (LATENCY) regresses from 30.62209055699998 to 31.122242091999965 (1.63%) in 2.8.0/scalability/single_node.json
REGRESSION 1.31%: single_client_tasks_sync (THROUGHPUT) regresses from 1177.0860205196755 to 1161.670131632561 (1.31%) in 2.8.0/microbenchmark.json
REGRESSION 1.30%: tasks_per_second (THROUGHPUT) regresses from 275.25470863736416 to 271.6776615598984 (1.30%) in 2.8.0/benchmarks/many_nodes.json
REGRESSION 0.97%: multi_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 12344.39546658664 to 12224.373147431208 (0.97%) in 2.8.0/microbenchmark.json
REGRESSION 0.94%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9672.982187721544 to 9581.728569086026 (0.94%) in 2.8.0/microbenchmark.json
REGRESSION 0.69%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 24458.207679236828 to 24290.541801601616 (0.69%) in 2.8.0/microbenchmark.json
REGRESSION 0.64%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8657.20480299259 to 8601.993472120319 (0.64%) in 2.8.0/microbenchmark.json
REGRESSION 0.30%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11449.695712792654 to 11415.752622212967 (0.30%) in 2.8.0/microbenchmark.json

@rickyyx
Copy link
Contributor

rickyyx commented Nov 1, 2023

single_client_tasks_async

This seems regression since we had a run on 2.8.0 with non-regressed perf just before: https://buildkite.com/ray-project/release-tests-branch/builds/2365#018b843e-61ff-4d6d-880e-0a10b6b122fa

@rickyyx
Copy link
Contributor

rickyyx commented Nov 1, 2023

stage_2_avg_iteration_time I think there's +-5 seconds regression variance.

And others metrics look like variance.

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general

There are 4 suspicious values. @rickyyx have you checked if those 4 are within the error range? I think it is probably too late to fix it now, but we can create an issue.

Also, @vitsai can you post the diff result (there's a script) and add a comment to this PR?

119.28968447369434
],
"1_n_actor_calls_async": [
9581.728569086026,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regressed?

{
"perf_metric_name": "single_client_tasks_async",
"perf_metric_type": "THROUGHPUT",
"perf_metric_value": 8643.833466025399
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty low

@@ -0,0 +1,13 @@
{
"broadcast_time": 82.940892212,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has a slight regression

{
"perf_metric_name": "stage_2_avg_iteration_time",
"perf_metric_type": "LATENCY",
"perf_metric_value": 68.77781887054444
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regression

@anyscalesam
Copy link
Collaborator

is this still relevant and a release-blocker for ray29?

@anyscalesam anyscalesam added the core Issues that should be addressed in Ray Core label Nov 21, 2023
@rickyyx
Copy link
Contributor

rickyyx commented Nov 21, 2023

@vitsai should i merge this?

@jjyao jjyao merged commit d6184c5 into ray-project:master Nov 22, 2023
14 of 15 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
Signed-off-by: vitsai <victoria@anyscale.com>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants