Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][serialization][cloudpickle] Downstream cloudpickle 3.0.0 changes #42730

Merged
merged 4 commits into from
Feb 1, 2024

Conversation

rickyyx
Copy link
Contributor

@rickyyx rickyyx commented Jan 26, 2024

Why are these changes needed?

cloudpickle has a commit that fixes the incompatibility with dataclasses: cloudpipe/cloudpickle@ab32ba5

Porting the upstream release (3.0.0) into ray's cloud pickle.

Related issue number

Closes #42167
Closes #28838
Closes #25951

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: rickyyx <rickyx@anyscale.com>
Signed-off-by: rickyyx <rickyx@anyscale.com>
@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 26, 2024

just copy paste from cloudpickle==3.0.0 files :)

@rickyyx rickyyx marked this pull request as ready for review January 26, 2024 04:02
@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 26, 2024

@suquark could you verify this way of syncing from upstream works? or if there's anything shouldn't be overwritten?

@rkooo567
Copy link
Contributor

@rickyyx can you run the microbenchmark to see if there's any perf change? Also, is it possible you make a small benchmark on your own for the serialization performance? (between older impl vs new upstream changes)

@suquark
Copy link
Member

suquark commented Jan 29, 2024

@rkooo567 I think this update should be ok. Besides dataclasses support, it is mostly about:

  1. removing python3.7 support
  2. linter and formatter change
  3. fix typos
  4. deprecating the "slow" cloudpickler, since we have pickle-5 protocol support by default

They are not broken changes for Ray (Ray supports python >= 3.8), so I think it should be ok.

Anyway we have to make sure all unittests pass, since a lot of functionality in Ray relies on cloudpickle in a complex way, it's hard to see the effect under the change of so much code.

@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 30, 2024

Microbecnhmark not showing perf diff:

Metric PR Run Value Master Run Value
single_client_get_calls_Plasma_Store 10288.52 10297.06
single_client_put_calls_Plasma_Store 5359.70 5466.18
multi_client_put_calls_Plasma_Store 12390.50 12069.89
single_client_put_gigabytes 20.25 19.54
single_client_tasks_and_get_batch 7.73 7.86
multi_client_put_gigabytes 33.88 36.48
single_client_get_object_containing_10k_refs 13.35 13.88
single_client_wait_1k_refs 5.10 5.45
single_client_tasks_sync 1034.55 1043.13
single_client_tasks_async 7549.56 8332.38
multi_client_tasks_async 23556.68 23350.87
1_1_actor_calls_sync 2026.97 2071.05
1_1_actor_calls_async 9144.84 8882.80
1_1_actor_calls_concurrent 5497.23 5475.92
1_n_actor_calls_async 8471.20 8540.28
n_n_actor_calls_async 27324.09 26895.85
n_n_actor_calls_with_arg_async 2826.13 2884.08
1_1_async_actor_calls_sync 1323.36 1363.17
1_1_async_actor_calls_async 3285.37 3492.84
1_1_async_actor_calls_with_args_async 2346.97 2405.61
1_n_async_actor_calls_async 7621.07 7628.48
n_n_async_actor_calls_async 23419.49 23469.56
placement_group_create/removal 837.74 835.24
client__get_calls 1078.76 1048.75
client__put_calls 857.19 818.04
client__put_gigabytes 0.13 0.13
client__tasks_and_put_batch 11482.82 11581.19
client__1_1_actor_calls_sync 542.96 558.50
client__1_1_actor_calls_async 1029.07 1064.06
client__1_1_actor_calls_concurrent 1030.57 1061.52
client__tasks_and_get_batch 0.97 0.99

@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 30, 2024

Single Node which puts many/large objects no regresssion

Metric Master Run Value PR Run Value Difference (PR - Master)
args_time 17.543 16.973 -0.570
returns_time 5.544 5.622 +0.078
get_time 27.719 26.194 -1.525
queued_time 194.880 193.345 -1.535
large_object_time 31.502 31.156 -0.346

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rickyyx so there's no additional code we need to check in? Is it possible we have some sort of README to this page how to do add upstream changes? (Like file X needs to add Y sort of things. If nothing is needed the doc can say it can just bring upstream changes)

@rickyyx
Copy link
Contributor Author

rickyyx commented Jan 31, 2024

@rickyyx so there's no additional code we need to check in? Is it possible we have some sort of README to this page how to do add upstream changes? (Like file X needs to add Y sort of things. If nothing is needed the doc can say it can just bring upstream changes)

sgtm. Although I still don't know why we vendored it in the first place.

@rkooo567
Copy link
Contributor

Actually @suquark do you have the context?

I thought we vendored it to support pickle 5 when Python version is < 3.8. But I am not 100% sure. Or is there any optimization we add here?

Signed-off-by: rickyyx <rickyx@anyscale.com>
@rickyyx rickyyx merged commit 8160fd7 into ray-project:master Feb 1, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants