Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release logs for 2.4.0 #33905

Merged
merged 27 commits into from
Apr 17, 2023
Merged

release logs for 2.4.0 #33905

merged 27 commits into from
Apr 17, 2023

Conversation

clarng
Copy link
Contributor

@clarng clarng commented Mar 30, 2023

Why are these changes needed?

Release logs perf benchmark for 2.4.0
Also updated tool to sort the regressions

Summary:
REGRESSION 29.82%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.20768872444627 to 25.409834920338362 (29.82%) in 2.4.0/microbenchmark.json
REGRESSION 5.40%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 26795.925682915393 to 25348.416455631697 (5.40%) in 2.4.0/microbenchmark.json
REGRESSION 5.19%: 1_1_async_actor_calls_with_args_async (THROUGHPUT) regresses from 2166.3177853141888 to 2053.88520928116 (5.19%) in 2.4.0/microbenchmark.json
REGRESSION 5.07%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.275239773951363 to 12.601815520396993 (5.07%) in 2.4.0/microbenchmark.json
REGRESSION 4.43%: actors_per_second (THROUGHPUT) regresses from 808.4407587905971 to 772.644103201044 (4.43%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 4.22%: single_client_tasks_async (THROUGHPUT) regresses from 11030.656570627058 to 10565.441516164923 (4.22%) in 2.4.0/microbenchmark.json
REGRESSION 3.58%: 1_1_async_actor_calls_async (THROUGHPUT) regresses from 3001.1530431430465 to 2893.7637668160814 (3.58%) in 2.4.0/microbenchmark.json
REGRESSION 2.99%: single_client_put_gigabytes (THROUGHPUT) regresses from 20.38683304648347 to 19.77763371659089 (2.99%) in 2.4.0/microbenchmark.json
REGRESSION 2.76%: 1_1_actor_calls_async (THROUGHPUT) regresses from 8098.766293651375 to 7875.2205662523575 (2.76%) in 2.4.0/microbenchmark.json
REGRESSION 2.56%: n_n_actor_calls_async (THROUGHPUT) regresses from 32387.32125643762 to 31558.549225320676 (2.56%) in 2.4.0/microbenchmark.json
REGRESSION 2.50%: tasks_per_second (THROUGHPUT) regresses from 221.7103038870812 to 216.16404352694366 (2.50%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 2.48%: client__put_gigabytes (THROUGHPUT) regresses from 0.045668066982356405 to 0.04453569846336401 (2.48%) in 2.4.0/microbenchmark.json
REGRESSION 1.82%: client__get_calls (THROUGHPUT) regresses from 1190.7189254696584 to 1169.0846386325316 (1.82%) in 2.4.0/microbenchmark.json
REGRESSION 0.60%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 4927.728219186553 to 4898.403930689569 (0.60%) in 2.4.0/microbenchmark.json
REGRESSION 0.39%: 1_n_actor_calls_async (THROUGHPUT) regresses from 10961.828968216958 to 10918.570247859934 (0.39%) in 2.4.0/microbenchmark.json


REGRESSION 10246.15%: stage_3_creation_time (LATENCY) regresses from 0.054503440856933594 to 5.639009952545166 (10246.15%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 948.85%: dashboard_p95_latency_ms (LATENCY) regresses from 12.292 to 128.925 (948.85%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 370.95%: dashboard_p95_latency_ms (LATENCY) regresses from 592.153 to 2788.74 (370.95%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 171.77%: dashboard_p99_latency_ms (LATENCY) regresses from 1262.576 to 3431.297 (171.77%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 42.64%: dashboard_p99_latency_ms (LATENCY) regresses from 2693.468 to 3842.061 (42.64%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 40.47%: dashboard_p95_latency_ms (LATENCY) regresses from 1722.469 to 2419.503 (40.47%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 38.97%: dashboard_p95_latency_ms (LATENCY) regresses from 40.367 to 56.099 (38.97%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 28.20%: stage_0_time (LATENCY) regresses from 11.479018211364746 to 14.71593689918518 (28.20%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 22.05%: 1000000_queued_time (LATENCY) regresses from 195.97103170400004 to 239.18845452300002 (22.05%) in 2.4.0/scalability/single_node.json
REGRESSION 20.33%: dashboard_p50_latency_ms (LATENCY) regresses from 4.171 to 5.019 (20.33%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 19.57%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8356835120128541 to 0.9992335435431319 (19.57%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 19.11%: dashboard_p50_latency_ms (LATENCY) regresses from 3.287 to 3.915 (19.11%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 15.05%: stage_4_spread (LATENCY) regresses from 0.7235994216435793 to 0.8324665945841853 (15.05%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 14.88%: avg_iteration_time (LATENCY) regresses from 2.1004259395599365 to 2.4128899502754213 (14.88%) in 2.4.0/stress_tests/stress_test_dead_actors.json
REGRESSION 13.97%: dashboard_p50_latency_ms (LATENCY) regresses from 30.46 to 34.714 (13.97%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 5.32%: dashboard_p50_latency_ms (LATENCY) regresses from 3.14 to 3.307 (5.32%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 3.94%: stage_3_time (LATENCY) regresses from 2642.5666913986206 to 2746.616822242737 (3.94%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.61%: 3000_returns_time (LATENCY) regresses from 5.873062359000016 to 6.026168543000011 (2.61%) in 2.4.0/scalability/single_node.json
REGRESSION 2.06%: 10000_args_time (LATENCY) regresses from 16.44382056200004 to 16.78179498900002 (2.06%) in 2.4.0/scalability/single_node.json
REGRESSION 2.03%: avg_pg_create_time_ms (LATENCY) regresses from 0.908270030030557 to 0.9266749219217597 (2.03%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 1.51%: stage_1_avg_iteration_time (LATENCY) regresses from 23.107464241981507 to 23.45582284927368 (1.51%) in 2.4.0/stress_tests/stress_test_many_tasks.json

#33492

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

clarng and others added 24 commits February 9, 2023 04:58
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
@clarng clarng requested a review from a team March 30, 2023 02:47
@clarng clarng changed the title Rl24 release logs for 2.4.0 Mar 30, 2023
@clarng clarng marked this pull request as ready for review March 30, 2023 03:16
@cadedaniel
Copy link
Member

Thanks for improving the comparison script!

Structure overall LGTM. Are you planning on updating this PR with new performance numbers once the important regressions are fixed?

@clarng
Copy link
Contributor Author

clarng commented Mar 31, 2023 via email

@cadedaniel
Copy link
Member

I think we should have a single PR for it, to make it less likely that one of the regression fixes to cause a different regression without us noticing. Otherwise it is easier for a regression-caused-by-a-fix to sneak in.

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
@clarng
Copy link
Contributor Author

clarng commented Apr 12, 2023

ping on review @cadedaniel @scv119

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Copy link
Member

@cadedaniel cadedaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rkooo567
Copy link
Contributor

Have we investigated

REGRESSION 29.82%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.20768872444627 to 25.409834920338362 (29.82%) in 2.4.0/microbenchmark.json

?

@clarng
Copy link
Contributor Author

clarng commented Apr 13, 2023

Have we investigated

REGRESSION 29.82%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.20768872444627 to 25.409834920338362 (29.82%) in 2.4.0/microbenchmark.json

?

seems its always been noisy

Screen Shot 2023-04-13 at 4 00 59 PM

@scv119
Copy link
Contributor

scv119 commented Apr 14, 2023

REGRESSION 10246.15%: stage_3_creation_time (LATENCY) regresses from 0.054503440856933594 to 5.639009952545166 (10246.15%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 948.85%: dashboard_p95_latency_ms (LATENCY) regresses from 12.292 to 128.925 (948.85%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 370.95%: dashboard_p95_latency_ms (LATENCY) regresses from 592.153 to 2788.74 (370.95%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 171.77%: dashboard_p99_latency_ms (LATENCY) regresses from 1262.576 to 3431.297 (171.77%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 42.64%: dashboard_p99_latency_ms (LATENCY) regresses from 2693.468 to 3842.061 (42.64%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 40.47%: dashboard_p95_latency_ms (LATENCY) regresses from 1722.469 to 2419.503 (40.47%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 38.97%: dashboard_p95_latency_ms (LATENCY) regresses from 40.367 to 56.099 (38.97%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 28.20%: stage_0_time (LATENCY) regresses from 11.479018211364746 to 14.71593689918518 (28.20%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 22.05%: 1000000_queued_time (LATENCY) regresses from 195.97103170400004 to 239.18845452300002 (22.05%) in 2.4.0/scalability/single_node.json
REGRESSION 20.33%: dashboard_p50_latency_ms (LATENCY) regresses from 4.171 to 5.019 (20.33%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 19.57%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8356835120128541 to 0.9992335435431319 (19.57%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 19.11%: dashboard_p50_latency_ms (LATENCY) regresses from 3.287 to 3.915 (19.11%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 15.05%: stage_4_spread (LATENCY) regresses from 0.7235994216435793 to 0.8324665945841853 (15.05%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 14.88%: avg_iteration_time (LATENCY) regresses from 2.1004259395599365 to 2.4128899502754213 (14.88%) in 2.4.0/stress_tests/stress_test_dead_actors.json

these all look very bad ... have we looked into them?

@clarng
Copy link
Contributor Author

clarng commented Apr 14, 2023

REGRESSION 10246.15%: stage_3_creation_time (LATENCY) regresses from 0.054503440856933594 to 5.639009952545166 (10246.15%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 948.85%: dashboard_p95_latency_ms (LATENCY) regresses from 12.292 to 128.925 (948.85%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 370.95%: dashboard_p95_latency_ms (LATENCY) regresses from 592.153 to 2788.74 (370.95%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 171.77%: dashboard_p99_latency_ms (LATENCY) regresses from 1262.576 to 3431.297 (171.77%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 42.64%: dashboard_p99_latency_ms (LATENCY) regresses from 2693.468 to 3842.061 (42.64%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 40.47%: dashboard_p95_latency_ms (LATENCY) regresses from 1722.469 to 2419.503 (40.47%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 38.97%: dashboard_p95_latency_ms (LATENCY) regresses from 40.367 to 56.099 (38.97%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 28.20%: stage_0_time (LATENCY) regresses from 11.479018211364746 to 14.71593689918518 (28.20%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 22.05%: 1000000_queued_time (LATENCY) regresses from 195.97103170400004 to 239.18845452300002 (22.05%) in 2.4.0/scalability/single_node.json
REGRESSION 20.33%: dashboard_p50_latency_ms (LATENCY) regresses from 4.171 to 5.019 (20.33%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 19.57%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8356835120128541 to 0.9992335435431319 (19.57%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 19.11%: dashboard_p50_latency_ms (LATENCY) regresses from 3.287 to 3.915 (19.11%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 15.05%: stage_4_spread (LATENCY) regresses from 0.7235994216435793 to 0.8324665945841853 (15.05%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 14.88%: avg_iteration_time (LATENCY) regresses from 2.1004259395599365 to 2.4128899502754213 (14.88%) in 2.4.0/stress_tests/stress_test_dead_actors.json

these all look very bad ... have we looked into them?

We opened the following release blockers for those:
#33931
#33931

@rkooo567
Copy link
Contributor

@scv119 stage_3_creation_time: it is expected to have regression because we changed tests.
dashboard tests: these are not meant to be tracked, and I will make better metrics in 2.5
avg_pg_remove_time_ms: @iycheng do you know why it is not fixed? Have you merged your PR to the release branch?

stage_0_time: It is your call @scv119

1000000_queued_time, stage_4_spread, avg_iteration_time seems like they are kind of unexpected?

@clarng
Copy link
Contributor Author

clarng commented Apr 14, 2023

stage_4_spread -> that is mostly noise:

Screen Shot 2023-04-14 at 1 27 00 PM

1000000_queued_time -> that is mostly noise:

Screen Shot 2023-04-14 at 1 28 57 PM

avg_iteration_time -> that is mostly noise:

Screen Shot 2023-04-14 at 1 30 23 PM

stage_0_time we said it was ok during our weekly sync cc @scv119

@jjyao
Copy link
Contributor

jjyao commented Apr 17, 2023

stage_0_time is noise. I had a run with

{
                "perf_metric_name": "stage_0_time",
                "perf_metric_type": "LATENCY",
                "perf_metric_value": 10.505309343338013
            },

https://buildkite.com/ray-project/release-tests-branch/builds/1561#018780f5-9cc1-45c7-98fb-12acf21069a0

@jjyao
Copy link
Contributor

jjyao commented Apr 17, 2023

avg_pg_remove_time_ms looks good as well with the latest run

{
                "perf_metric_name": "avg_pg_remove_time_ms",
                "perf_metric_type": "LATENCY",
                "perf_metric_value": 0.8243641636640946
            }

https://buildkite.com/ray-project/release-tests-branch/builds/1561#018780f5-9cdb-4cc9-9a68-40259789efa1

@jjyao jjyao merged commit 4367c80 into ray-project:master Apr 17, 2023
justinvyu pushed a commit to justinvyu/ray that referenced this pull request Apr 18, 2023
Release logs perf benchmark for 2.4.0
Also updated tool to sort the regressions

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Co-authored-by: Clarence Ng <clarence@anyscale.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
Release logs perf benchmark for 2.4.0
Also updated tool to sort the regressions

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Co-authored-by: Clarence Ng <clarence@anyscale.com>
Signed-off-by: elliottower <elliot@elliottower.com>
@jjyao jjyao mentioned this pull request Apr 23, 2023
8 tasks
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
Release logs perf benchmark for 2.4.0
Also updated tool to sort the regressions

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>
Co-authored-by: Clarence Ng <clarence@anyscale.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants