fix(greet): update dependency ray to v2.10.0 #15393

renovate · 2024-03-21T22:52:31Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
ray	`2.9.3` -> `2.10.0`

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

Release Notes

ray-project/ray (ray)

`v2.10.0`

Compare Source

Release Highlights

Ray 2.10 release brings important stability improvements and enhancements to Ray Data, with Ray Data becoming generally available (GA).

[Data] Ray Data becomes generally available with stability improvements in streaming execution, reading and writing data, better tasks concurrency control, and debuggability improvement with dashboard, logging and metrics visualization.
[RLlib] “New API Stack” officially announced as alpha for PPO and SAC.
[Serve] Added a default autoscaling policy set via num_replicas=”auto” (#42613).
[Serve] Added support for active load shedding via max_queued_requests (#42950).
[Serve] Added replica queue length caching to the DeploymentHandle scheduler (#42943).
- This should improve overhead in the Serve proxy and handles.
- max_ongoing_requests (max_concurrent_queries) is also now strictly enforced (#42947).
- If you see any issues, please report them on GitHub and you can disable this behavior by setting: RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE=0.
[Serve] Renamed the following parameters. Each of the old names will be supported for another release before removal.
- max_concurrent_queries -> max_ongoing_requests
- target_num_ongoing_requests_per_replica -> target_ongoing_requests
- downscale_smoothing_factor -> downscaling_factor
- upscale_smoothing_factor -> upscaling_factor
[Serve] WARNING: the following default values will change in Ray 2.11:
- Default for max_ongoing_requests will change from 100 to 5.
- Default for target_ongoing_requests will change from 1 to 2.
[Core] Autoscaler v2 is in alpha and can be tried out with Kuberay. It has improved observability and stability compared to v1.
[Train] Added support for accelerator types via ScalingConfig(accelerator_type).
[Train] Revamped the XGBoostTrainer and LightGBMTrainer to no longer depend on xgboost_ray and lightgbm_ray. A new, more flexible API will be released in a future release.
[Train/Tune] Refactored local staging directory to remove the need for local_dir and RAY_AIR_LOCAL_CACHE_DIR.

Ray Libraries

Ray Data

🎉 New Features:

Streaming execution stability improvement to avoid memory issue, including per-operator resource reservation, streaming generator output buffer management, and better runtime resource estimation (#43026, #43171, #43298, #43299, #42930, #42504)
Metadata read stability improvement to avoid AWS transient error, including retry on application-level exception, spread tasks across multiple nodes, and configure retry interval (#42044, #43216, #42922, #42759).
Allow tasks concurrency control for read, map, and write APIs (#42849, #43113, #43177, #42637)
Data dashboard and statistics improvement with more runtime metrics for each components (#43790, #43628, #43241, #43477, #43110, #43112)
Allow to specify application-level error to retry for actor task (#42492)
Add num_rows_per_file parameter to file-based writes (#42694)
Add DataIterator.materialize (#43210)
Skip schema call in DataIterator.to_tf if tf.TypeSpec is provided (#42917)
Add option to append for Dataset.write_bigquery (#42584)
Deprecate legacy components and classes (#43575, #43178, #43347, #43349, #43342, #43341, #42936, #43144, #43022, #43023)

💫 Enhancements:

Restructure stdout logging for better readability (#43360)
Add a more performant way to read large TFRecord datasets (#42277)
Modify ImageDatasource to use Image.BILINEAR as the default image resampling filter (#43484)
Reduce internal stack trace output by default (#43251)
Perform incremental writes to Parquet files (#43563)
Warn on excessive driver memory usage during shuffle ops (#42574)
Distributed reads for ray.data.from_huggingface (#42599)
Remove Stage class and related usages (#42685)
Improve stability of reading JSON files to avoid PyArrow errors (#42558, #42357)

🔨 Fixes:

Turn off actor locality by default (#44124)
Normalize block types before internal multi-block operations (#43764)
Fix memory metrics for OutputSplitter (#43740)
Fix race condition issue in OpBufferQueue (#43015)
Fix early stop for multiple Limit operators. (#42958)
Fix deadlocks caused by Dataset.streaming_split for job hanging (#42601)

📖 Documentation:

Revamp Ray Data documentation for GA (#44006, #44007, #44008, #44098, #44168, #44093, #44105)

Ray Train

🎉 New Features:

Add support for accelerator types via ScalingConfig(accelerator_type) for improved worker scheduling (#43090)

💫 Enhancements:

Add a backend-specific context manager for train_func for setup/teardown logic (#43209)
Remove DEFAULT_NCCL_SOCKET_IFNAME to simplify network configuration (#42808)
Colocate Trainer with rank 0 Worker for to improve scheduling behavior (#43115)

🔨 Fixes:

Enable scheduling workers with memory resource requirements (#42999)
Make path behavior OS-agnostic by using Path.as_posix over os.path.join (#42037)
[Lightning] Fix resuming from checkpoint when using RayFSDPStrategy (#43594)
[Lightning] Fix deadlock in RayTrainReportCallback (#42751)
[Transformers] Fix checkpoint reporting behavior when get_latest_checkpoint returns None (#42953)

📖 Documentation:

Enhance docstring and user guides for train_loop_config (#43691)
Clarify in ray.train.report docstring that it is not a barrier (#42422)
Improve documentation for prepare_data_loader shuffle behavior and set_epoch (#41807)

🏗 Architecture refactoring:

Simplify XGBoost and LightGBM Trainer integrations. Implemented XGBoostTrainer and LightGBMTrainer as DataParallelTrainer. Removed dependency on xgboost_ray and lightgbm_ray. (#42111, #42767, #43244, #43424)
Refactor local staging directory to remove the need for local_dir and RAY_AIR_LOCAL_CACHE_DIR. Add isolation between driver and distributed worker artifacts so that large files written by workers are not uploaded implicitly. Results are now only written to storage_path, rather than having another copy in the user’s home directory (~/ray_results). (#43369, #43403, #43689)
Split overloaded ray.train.torch.get_device into another get_devices API for multi-GPU worker setup (#42314)
Refactor restoration configuration to be centered around storage_path (#42853, #43179)
Deprecations related to SyncConfig (#42909)
Remove deprecated preprocessor argument from Trainers (#43146, #43234)
Hard-deprecate MosaicTrainer and remove SklearnTrainer (#42814)

Ray Tune

💫 Enhancements:

Increase the minimum number of allowed pending trials for faster auto-scaleup (#43455)
Add support to TBXLogger for logging images (#37822)
Improve validation of Experiment(config) to handle RLlib AlgorithmConfig (#42816, #42116)

🔨 Fixes:

Fix reuse_actors error on actor cleanup for function trainables (#42951)
Make path behavior OS-agnostic by using Path.as_posix over os.path.join (#42037)

📖 Documentation:

Minor documentation fixes (#42118, #41982)

🏗 Architecture refactoring:

Refactor local staging directory to remove the need for local_dir and RAY_AIR_LOCAL_CACHE_DIR. Add isolation between driver and distributed worker artifacts so that large files written by workers are not uploaded implicitly. Results are now only written to storage_path, rather than having another copy in the user’s home directory (~/ray_results). (#43369, #43403, #43689)
Deprecations related to SyncConfig and chdir_to_trial_dir (#42909)
Refactor restoration configuration to be centered around storage_path (#42853, #43179)
Add back NevergradSearch (#42305)
Clean up invalid checkpoint_dir and reporter deprecation notices (#42698)

Ray Serve

🎉 New Features:

Added support for active load shedding via max_queued_requests (#42950).
Added a default autoscaling policy set via num_replicas=”auto” (#42613).

🏗 API Changes:

Renamed the following parameters. Each of the old names will be supported for another release before removal.
- max_concurrent_queries to max_ongoing_requests
- target_num_ongoing_requests_per_replica to target_ongoing_requests
- downscale_smoothing_factor to downscaling_factor
- upscale_smoothing_factor to upscaling_factor
WARNING: the following default values will change in Ray 2.11:
- Default for max_ongoing_requests will change from 100 to 5.
- Default for target_ongoing_requests will change from 1 to 2.

💫 Enhancements:

Add RAY_SERVE_LOG_ENCODING env to set the global logging behavior for Serve (#42781).
Config Serve's gRPC proxy to allow large payload (#43114).
Add blocking flag to serve.run() (#43227).
Add actor id and worker id to Serve structured logs (#43725).
Added replica queue length caching to the DeploymentHandle scheduler (#42943).
- This should improve overhead in the Serve proxy and handles.
- max_ongoing_requests (max_concurrent_queries) is also now strictly enforced (#42947).
- If you see any issues, please report them on GitHub and you can disable this behavior by setting: RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE=0.
Autoscaling metrics (tracking ongoing and queued metrics) are now collected at deployment handles by default instead of at the Serve replicas (#42578).
- This means you can now set max_ongoing_requests=1 for autoscaling deployments and still upscale properly, because requests queued at handles are properly taken into account for autoscaling.
- You should expect deployments to upscale more aggressively during bursty traffic, because requests will likely queue up at handles during bursts of traffic.
- If you see any issues, please report them on GitHub and you can switch back to the old method of collecting metrics by setting the environment variable RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0
Improved the downscaling behavior of smoothing_factor for low numbers of replicas (#42612).
Various logging improvements (#43707, #43708, #43629, #43557).
During in-place upgrades or when replicas become unhealthy, Serve will no longer wait for old replicas to gracefully terminate before starting new ones (#43187). New replicas will be eagerly started to satisfy the target number of healthy replicas.
- This new behavior is on by default and can be turned off by setting RAY_SERVE_EAGERLY_START_REPLACEMENT_REPLICAS=0

🔨 Fixes:

Fix deployment route prefix override by default route prefix from serve run cli (#43805).
Fixed a bug causing batch methods to hang upon cancellation (#42593).
Unpinned FastAPI dependency version (#42711).
Delay proxy marking itself as healthy until it has routes from the controller (#43076).
Fixed an issue where multiplexed deployments could go into infinite backoff (#43965).
Silence noisy KeyError on disconnects (#43713).
Fixed the prometheus counter metrics emitted as gauge bug (#43795, #43901).
- All the serve counter metrics are emitted as counters with _total suffix. The old gauge metrics are still emitted for compatibility.

📖 Documentation:

Update serve logging config docs (#43483).
Added documentation for max_replicas_per_node (#42743).

RLlib

🎉 New Features:

The “new API stack” is now in alpha stage and available for PPO single- (#42272) and multi-agent and for SAC single-agent (#42571, #42570, #42568)
- ConnectorV2 API (#43669, #43680, #43040, #41074, #41212)
- Episode APIs (SingleAgentEpisode and MultiAgentEpisode) (#42009, #43275, #42296, #43818, #41631)
- EnvRunner APIs (SingleAgentEnvRunner and MultiAgentEnvRunner) (#41558, #41825, #42296, #43779)
In preparation of DQN on the new API stack: PrioritizedEpisodeReplayBuffer (#43258, #42832)

💫 Enhancements:

Old API Stack cleanups:
- Move SampleBatch column names (e.g. SampleBatch.OBS) into new class (Columns). (#43665)
- Remove old exec_plan API code. (#41585)
- Introduce OldAPIStack decorator (#43657)
- RLModule API: Add functionality to define kernel and bias initializers via config. (#42137)
Learner/LearnerGroup APIs:
- Replace Learner/LearnerGroup specific config classes (e.g. LearnerHyperparameters) with AlgorithmConfig. (#41296)
- Learner/LearnerGroup: Allow updating from Episodes. (#41235)
In preparation of DQN on the new API stack: (#43199, #43196)

🔨 Fixes:

New API Stack bug fixes: Fix policy_to_train logic (#41529), fix multi-APU for PPO on the new API stack. (#44001), Issue 40347: (#42090)
Other fixes: MultiAgentEnv would NOT call env.close() on a failed sub-env (#43664), Issue 42152 (#43317), issue 42396: (#43316), issue 41518 (#42011), issue 42385 (#43313)

📖 Documentation:

New API Stack examples: Self-play and league-based self-play (#43276), MeanStdFilter (for both single-agent and multi-agent) (#43274), Prev-actions/prev-rewards for multi-agent (#43491)
Other docs fixes and enhancements: (#43438, #41472, #42117, #43458)

Ray Core and Ray Clusters

Ray Core

🎉 New Features:

Autoscaler v2 is in alpha and can be tried out with Kuberay.
Introduced subreaper to prevent leaks of sub-processes created by user code. (#42992)

💫 Enhancements:

Ray state api get_task() now accepts ObjectRef (#43507)
Add an option to disable task tracing for task/actor (#42431)
Improved object transfer throughput. (#43434)
Ray client now compares the Ray and Python version for compatibility with the remote Ray cluster. (#42760)

🔨 Fixes:

Fixed several bugs for streaming generator (#43775, #43772, #43413)
Fixed Ray counter metrics emitted as gauge bug (#43795)
Fixed a bug where empty resource task doesn’t work with placement group (#43448)
Fixed a bug where CPU resource is not released for a blocked worker inside placement group (#43270)
Fixed GCS crashes when PG commit phase failed due to node failure (#43405)
Fixed a bug where Ray memory monitor prematurely kill tasks (#43071)
Fixed placement group resource leak (#42942)
Upgraded cloudpickle to 3.0 which fixes the incompatibility with dataclasses (#42730)

📖 Documentation:

Updated the doc for Ray accelerators support (#41849)

Ray Clusters

💫 Enhancements:

[spark] Add heap_memory param for setup_ray_cluster API, and change default value of per ray worker node config, and change default value of ray head node config for global Ray cluster (#42604)
[spark] Add global mode for ray on spark cluster (#41153)

🔨 Fixes:

[VSphere] Only deploy ovf to first host of cluster (#42258)

Thanks

Many thanks to all those who contributed to this release!

@ronyw7, @xsqian, @justinvyu, @matthewdeng, @sven1977, @thomasdesr, @veryhannibal, @klebster2, @can-anyscale, @simran-2797, @stephanie-wang, @simonsays1980, @kouroshHakha, @Zandew, @akshay-anyscale, @matschaffer-roblox, @WeichenXu123, @matthew29tang, @vitsai, @Hank0626, @anmyachev, @kira-lin, @ericl, @zcin, @sihanwang41, @peytondmurray, @raulchen, @aslonnie, @ruisearch42, @vszal, @pcmoritz, @rickyyx, @chrislevn, @brycehuang30, @alexeykudinkin, @vonsago, @shrekris-anyscale, @andrewsykim, @c21, @mattip, @hongchaodeng, @dabauxi, @fishbone, @scottjlee, @justina777, @surenyufuz, @robertnishihara, @nikitavemuri, @Yard1, @huchen2021, @shomilj, @architkulkarni, @liuxsh9, @Jocn2020, @liuyang-my, @rkooo567, @alanwguo, @KPostOffice, @woshiyyya, @n30111, @edoakes, @y-abe, @martinbomio, @jiwq, @arunppsg, @ArturNiederfahrenhorst, @kevin85421, @khluu, @JingChen23, @masariello, @angelinalg, @jjyao, @omatthew98, @jonathan-anyscale, @sjoshi6, @gaborgsomogyi, @rynewang, @ratnopamc, @chris-ray-zhang, @ijrsvt, @scottsun94, @raychen911, @franklsf95, @GeneDer, @madhuri-rai07, @scv119, @bveeramani, @anyscalesam, @zen-xu, @npuichigo

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

sonarcloud · 2024-03-21T22:53:51Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

fix(greet): update dependency ray to v2.10.0

2af2325

renovate bot temporarily deployed to test March 21, 2024 22:52 Inactive

mergify bot approved these changes Mar 21, 2024

View reviewed changes

mergify bot merged commit bab5d53 into main Mar 21, 2024
124 checks passed

mergify bot deleted the renovate/greet-ray-2.x branch March 21, 2024 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(greet): update dependency ray to v2.10.0 #15393

fix(greet): update dependency ray to v2.10.0 #15393

renovate bot commented Mar 21, 2024

sonarcloud bot commented Mar 21, 2024

fix(greet): update dependency ray to v2.10.0 #15393

fix(greet): update dependency ray to v2.10.0 #15393

Conversation

renovate bot commented Mar 21, 2024

Release Notes

v2.10.0

Release Highlights

Ray Libraries

Ray Data

Ray Train

Ray Tune

Ray Serve

RLlib

Ray Core and Ray Clusters

Ray Core

Ray Clusters

Thanks

Configuration

sonarcloud bot commented Mar 21, 2024

Quality Gate passed

`v2.10.0`