release logs for 2.4.0 #33905

clarng · 2023-03-30T02:26:26Z

Why are these changes needed?

Release logs perf benchmark for 2.4.0
Also updated tool to sort the regressions

Summary:
REGRESSION 29.82%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.20768872444627 to 25.409834920338362 (29.82%) in 2.4.0/microbenchmark.json
REGRESSION 5.40%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 26795.925682915393 to 25348.416455631697 (5.40%) in 2.4.0/microbenchmark.json
REGRESSION 5.19%: 1_1_async_actor_calls_with_args_async (THROUGHPUT) regresses from 2166.3177853141888 to 2053.88520928116 (5.19%) in 2.4.0/microbenchmark.json
REGRESSION 5.07%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.275239773951363 to 12.601815520396993 (5.07%) in 2.4.0/microbenchmark.json
REGRESSION 4.43%: actors_per_second (THROUGHPUT) regresses from 808.4407587905971 to 772.644103201044 (4.43%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 4.22%: single_client_tasks_async (THROUGHPUT) regresses from 11030.656570627058 to 10565.441516164923 (4.22%) in 2.4.0/microbenchmark.json
REGRESSION 3.58%: 1_1_async_actor_calls_async (THROUGHPUT) regresses from 3001.1530431430465 to 2893.7637668160814 (3.58%) in 2.4.0/microbenchmark.json
REGRESSION 2.99%: single_client_put_gigabytes (THROUGHPUT) regresses from 20.38683304648347 to 19.77763371659089 (2.99%) in 2.4.0/microbenchmark.json
REGRESSION 2.76%: 1_1_actor_calls_async (THROUGHPUT) regresses from 8098.766293651375 to 7875.2205662523575 (2.76%) in 2.4.0/microbenchmark.json
REGRESSION 2.56%: n_n_actor_calls_async (THROUGHPUT) regresses from 32387.32125643762 to 31558.549225320676 (2.56%) in 2.4.0/microbenchmark.json
REGRESSION 2.50%: tasks_per_second (THROUGHPUT) regresses from 221.7103038870812 to 216.16404352694366 (2.50%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 2.48%: client__put_gigabytes (THROUGHPUT) regresses from 0.045668066982356405 to 0.04453569846336401 (2.48%) in 2.4.0/microbenchmark.json
REGRESSION 1.82%: client__get_calls (THROUGHPUT) regresses from 1190.7189254696584 to 1169.0846386325316 (1.82%) in 2.4.0/microbenchmark.json
REGRESSION 0.60%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 4927.728219186553 to 4898.403930689569 (0.60%) in 2.4.0/microbenchmark.json
REGRESSION 0.39%: 1_n_actor_calls_async (THROUGHPUT) regresses from 10961.828968216958 to 10918.570247859934 (0.39%) in 2.4.0/microbenchmark.json


REGRESSION 10246.15%: stage_3_creation_time (LATENCY) regresses from 0.054503440856933594 to 5.639009952545166 (10246.15%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 948.85%: dashboard_p95_latency_ms (LATENCY) regresses from 12.292 to 128.925 (948.85%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 370.95%: dashboard_p95_latency_ms (LATENCY) regresses from 592.153 to 2788.74 (370.95%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 171.77%: dashboard_p99_latency_ms (LATENCY) regresses from 1262.576 to 3431.297 (171.77%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 42.64%: dashboard_p99_latency_ms (LATENCY) regresses from 2693.468 to 3842.061 (42.64%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 40.47%: dashboard_p95_latency_ms (LATENCY) regresses from 1722.469 to 2419.503 (40.47%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 38.97%: dashboard_p95_latency_ms (LATENCY) regresses from 40.367 to 56.099 (38.97%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 28.20%: stage_0_time (LATENCY) regresses from 11.479018211364746 to 14.71593689918518 (28.20%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 22.05%: 1000000_queued_time (LATENCY) regresses from 195.97103170400004 to 239.18845452300002 (22.05%) in 2.4.0/scalability/single_node.json
REGRESSION 20.33%: dashboard_p50_latency_ms (LATENCY) regresses from 4.171 to 5.019 (20.33%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 19.57%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8356835120128541 to 0.9992335435431319 (19.57%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 19.11%: dashboard_p50_latency_ms (LATENCY) regresses from 3.287 to 3.915 (19.11%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 15.05%: stage_4_spread (LATENCY) regresses from 0.7235994216435793 to 0.8324665945841853 (15.05%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 14.88%: avg_iteration_time (LATENCY) regresses from 2.1004259395599365 to 2.4128899502754213 (14.88%) in 2.4.0/stress_tests/stress_test_dead_actors.json
REGRESSION 13.97%: dashboard_p50_latency_ms (LATENCY) regresses from 30.46 to 34.714 (13.97%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 5.32%: dashboard_p50_latency_ms (LATENCY) regresses from 3.14 to 3.307 (5.32%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 3.94%: stage_3_time (LATENCY) regresses from 2642.5666913986206 to 2746.616822242737 (3.94%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 2.61%: 3000_returns_time (LATENCY) regresses from 5.873062359000016 to 6.026168543000011 (2.61%) in 2.4.0/scalability/single_node.json
REGRESSION 2.06%: 10000_args_time (LATENCY) regresses from 16.44382056200004 to 16.78179498900002 (2.06%) in 2.4.0/scalability/single_node.json
REGRESSION 2.03%: avg_pg_create_time_ms (LATENCY) regresses from 0.908270030030557 to 0.9266749219217597 (2.03%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 1.51%: stage_1_avg_iteration_time (LATENCY) regresses from 23.107464241981507 to 23.45582284927368 (1.51%) in 2.4.0/stress_tests/stress_test_many_tasks.json

#33492

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

cadedaniel · 2023-03-31T16:31:25Z

Thanks for improving the comparison script!

Structure overall LGTM. Are you planning on updating this PR with new performance numbers once the important regressions are fixed?

clarng · 2023-03-31T17:52:32Z

Yes we will keep this updated. I think we should still merge this first, and update the likes that improved after we merge in fixes ?

On Fri, Mar 31, 2023 at 9:31 AM Cade Daniel ***@***.***> wrote: Thanks for improving the comparison script! Structure overall LGTM. Are you planning on updating this PR with new performance numbers once the important regressions are fixed? — Reply to this email directly, view it on GitHub <#33905 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFOFZ53N6K65XIKOI5U5FJTW64BGRANCNFSM6AAAAAAWMU5UC4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Best, Clarence

cadedaniel · 2023-03-31T17:58:17Z

I think we should have a single PR for it, to make it less likely that one of the regression fixes to cause a different regression without us noticing. Otherwise it is easier for a regression-caused-by-a-fix to sneak in.

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

clarng · 2023-04-12T20:54:05Z

ping on review @cadedaniel @scv119

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

cadedaniel

LGTM!

rkooo567 · 2023-04-13T22:04:10Z

Have we investigated

REGRESSION 29.82%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.20768872444627 to 25.409834920338362 (29.82%) in 2.4.0/microbenchmark.json

?

clarng · 2023-04-13T23:01:39Z

Have we investigated

REGRESSION 29.82%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.20768872444627 to 25.409834920338362 (29.82%) in 2.4.0/microbenchmark.json

?

seems its always been noisy

scv119 · 2023-04-14T05:16:38Z

REGRESSION 10246.15%: stage_3_creation_time (LATENCY) regresses from 0.054503440856933594 to 5.639009952545166 (10246.15%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 948.85%: dashboard_p95_latency_ms (LATENCY) regresses from 12.292 to 128.925 (948.85%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 370.95%: dashboard_p95_latency_ms (LATENCY) regresses from 592.153 to 2788.74 (370.95%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 171.77%: dashboard_p99_latency_ms (LATENCY) regresses from 1262.576 to 3431.297 (171.77%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 42.64%: dashboard_p99_latency_ms (LATENCY) regresses from 2693.468 to 3842.061 (42.64%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 40.47%: dashboard_p95_latency_ms (LATENCY) regresses from 1722.469 to 2419.503 (40.47%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 38.97%: dashboard_p95_latency_ms (LATENCY) regresses from 40.367 to 56.099 (38.97%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 28.20%: stage_0_time (LATENCY) regresses from 11.479018211364746 to 14.71593689918518 (28.20%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 22.05%: 1000000_queued_time (LATENCY) regresses from 195.97103170400004 to 239.18845452300002 (22.05%) in 2.4.0/scalability/single_node.json
REGRESSION 20.33%: dashboard_p50_latency_ms (LATENCY) regresses from 4.171 to 5.019 (20.33%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 19.57%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8356835120128541 to 0.9992335435431319 (19.57%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 19.11%: dashboard_p50_latency_ms (LATENCY) regresses from 3.287 to 3.915 (19.11%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 15.05%: stage_4_spread (LATENCY) regresses from 0.7235994216435793 to 0.8324665945841853 (15.05%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 14.88%: avg_iteration_time (LATENCY) regresses from 2.1004259395599365 to 2.4128899502754213 (14.88%) in 2.4.0/stress_tests/stress_test_dead_actors.json

these all look very bad ... have we looked into them?

clarng · 2023-04-14T05:50:09Z

REGRESSION 10246.15%: stage_3_creation_time (LATENCY) regresses from 0.054503440856933594 to 5.639009952545166 (10246.15%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 948.85%: dashboard_p95_latency_ms (LATENCY) regresses from 12.292 to 128.925 (948.85%) in 2.4.0/benchmarks/many_pgs.json
REGRESSION 370.95%: dashboard_p95_latency_ms (LATENCY) regresses from 592.153 to 2788.74 (370.95%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 171.77%: dashboard_p99_latency_ms (LATENCY) regresses from 1262.576 to 3431.297 (171.77%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 42.64%: dashboard_p99_latency_ms (LATENCY) regresses from 2693.468 to 3842.061 (42.64%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 40.47%: dashboard_p95_latency_ms (LATENCY) regresses from 1722.469 to 2419.503 (40.47%) in 2.4.0/benchmarks/many_actors.json
REGRESSION 38.97%: dashboard_p95_latency_ms (LATENCY) regresses from 40.367 to 56.099 (38.97%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 28.20%: stage_0_time (LATENCY) regresses from 11.479018211364746 to 14.71593689918518 (28.20%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 22.05%: 1000000_queued_time (LATENCY) regresses from 195.97103170400004 to 239.18845452300002 (22.05%) in 2.4.0/scalability/single_node.json
REGRESSION 20.33%: dashboard_p50_latency_ms (LATENCY) regresses from 4.171 to 5.019 (20.33%) in 2.4.0/benchmarks/many_tasks.json
REGRESSION 19.57%: avg_pg_remove_time_ms (LATENCY) regresses from 0.8356835120128541 to 0.9992335435431319 (19.57%) in 2.4.0/stress_tests/stress_test_placement_group.json
REGRESSION 19.11%: dashboard_p50_latency_ms (LATENCY) regresses from 3.287 to 3.915 (19.11%) in 2.4.0/benchmarks/many_nodes.json
REGRESSION 15.05%: stage_4_spread (LATENCY) regresses from 0.7235994216435793 to 0.8324665945841853 (15.05%) in 2.4.0/stress_tests/stress_test_many_tasks.json
REGRESSION 14.88%: avg_iteration_time (LATENCY) regresses from 2.1004259395599365 to 2.4128899502754213 (14.88%) in 2.4.0/stress_tests/stress_test_dead_actors.json

these all look very bad ... have we looked into them?

We opened the following release blockers for those:
#33931
#33931

rkooo567 · 2023-04-14T15:04:38Z

@scv119 stage_3_creation_time: it is expected to have regression because we changed tests.
dashboard tests: these are not meant to be tracked, and I will make better metrics in 2.5
avg_pg_remove_time_ms: @iycheng do you know why it is not fixed? Have you merged your PR to the release branch?

stage_0_time: It is your call @scv119

1000000_queued_time, stage_4_spread, avg_iteration_time seems like they are kind of unexpected?

clarng · 2023-04-14T20:31:39Z

stage_4_spread -> that is mostly noise:

1000000_queued_time -> that is mostly noise:

avg_iteration_time -> that is mostly noise:

stage_0_time we said it was ok during our weekly sync cc @scv119

jjyao · 2023-04-17T19:51:47Z

stage_0_time is noise. I had a run with

{
                "perf_metric_name": "stage_0_time",
                "perf_metric_type": "LATENCY",
                "perf_metric_value": 10.505309343338013
            },

https://buildkite.com/ray-project/release-tests-branch/builds/1561#018780f5-9cc1-45c7-98fb-12acf21069a0

jjyao · 2023-04-17T19:53:32Z

avg_pg_remove_time_ms looks good as well with the latest run

{
                "perf_metric_name": "avg_pg_remove_time_ms",
                "perf_metric_type": "LATENCY",
                "perf_metric_value": 0.8243641636640946
            }

https://buildkite.com/ray-project/release-tests-branch/builds/1561#018780f5-9cdb-4cc9-9a68-40259789efa1

Release logs perf benchmark for 2.4.0 Also updated tool to sort the regressions Signed-off-by: Clarence Ng <clarence.wyng@gmail.com> Co-authored-by: Clarence Ng <clarence@anyscale.com>

Release logs perf benchmark for 2.4.0 Also updated tool to sort the regressions Signed-off-by: Clarence Ng <clarence.wyng@gmail.com> Co-authored-by: Clarence Ng <clarence@anyscale.com> Signed-off-by: elliottower <elliot@elliottower.com>

Release logs perf benchmark for 2.4.0 Also updated tool to sort the regressions Signed-off-by: Clarence Ng <clarence.wyng@gmail.com> Co-authored-by: Clarence Ng <clarence@anyscale.com> Signed-off-by: Jack He <jackhe2345@gmail.com>

clarng and others added 24 commits February 9, 2023 04:58

dask

91bb17a

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

cf9481c

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

3c22597

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

7ba7e5d

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

601fae9

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

1df3ee1

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

3539a18

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

c47b721

Merge branch 'master' of https://github.com/ray-project/ray

2a3891e

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

c8f01ec

Merge branch 'master' of https://github.com/clarng/ray

87c48ea

Merge branch 'master' of https://github.com/ray-project/ray

98dce1b

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

42d11a1

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

d667cfb

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

e3e9a46

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

0a013b6

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

54477dc

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

ec33eda

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

baad2e2

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray

2027ecd

Merge branch 'master' of https://github.com/ray-project/ray

f3d4d85

Merge branch 'master' of https://github.com/ray-project/ray

ff10a05

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] perf snapshot for 2.4.0

9180eb7

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] perf snapshot for 2.4.0

397421f

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

clarng requested a review from a team March 30, 2023 02:47

clarng changed the title ~~Rl24~~ release logs for 2.4.0 Mar 30, 2023

clarng marked this pull request as ready for review March 30, 2023 03:16

clarng mentioned this pull request Mar 30, 2023

[core] dashboard latency regressions #33888

Closed

clarng assigned scv119 and cadedaniel Mar 30, 2023

clarng added the v2.4.0-pick label Apr 6, 2023

clarng added 2 commits April 11, 2023 14:01

[core] perf snapshot for 2.4.0

898cd4b

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] perf snapshot for 2.4.0

c90a703

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

[core] release logs for 2.4.0

6c3098c

Signed-off-by: Clarence Ng <clarence.wyng@gmail.com>

cadedaniel approved these changes Apr 12, 2023

View reviewed changes

scv119 approved these changes Apr 17, 2023

View reviewed changes

jjyao merged commit 4367c80 into ray-project:master Apr 17, 2023

jjyao mentioned this pull request Apr 23, 2023

[Release] Update 2.4.0 release logs #34700

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release logs for 2.4.0 #33905

release logs for 2.4.0 #33905

clarng commented Mar 30, 2023 •

edited by cadedaniel

Loading

cadedaniel commented Mar 31, 2023

clarng commented Mar 31, 2023 via email

cadedaniel commented Mar 31, 2023

clarng commented Apr 12, 2023

cadedaniel left a comment

rkooo567 commented Apr 13, 2023

clarng commented Apr 13, 2023

scv119 commented Apr 14, 2023 •

edited

Loading

clarng commented Apr 14, 2023

rkooo567 commented Apr 14, 2023

clarng commented Apr 14, 2023

jjyao commented Apr 17, 2023

jjyao commented Apr 17, 2023 •

edited

Loading

release logs for 2.4.0 #33905

release logs for 2.4.0 #33905

Conversation

clarng commented Mar 30, 2023 • edited by cadedaniel Loading

Why are these changes needed?

Related issue number

Checks

cadedaniel commented Mar 31, 2023

clarng commented Mar 31, 2023 via email

cadedaniel commented Mar 31, 2023

clarng commented Apr 12, 2023

cadedaniel left a comment

Choose a reason for hiding this comment

rkooo567 commented Apr 13, 2023

clarng commented Apr 13, 2023

scv119 commented Apr 14, 2023 • edited Loading

clarng commented Apr 14, 2023

rkooo567 commented Apr 14, 2023

clarng commented Apr 14, 2023

jjyao commented Apr 17, 2023

jjyao commented Apr 17, 2023 • edited Loading

clarng commented Mar 30, 2023 •

edited by cadedaniel

Loading

scv119 commented Apr 14, 2023 •

edited

Loading

jjyao commented Apr 17, 2023 •

edited

Loading