Experimental features in Ray AIR #36949

krfricke · 2023-06-29T08:56:51Z

Experimental features in Ray AIR

The Ray Team is testing a number of experimental features in Ray AIR.

During development, the features are disabled per default. You can opt-in by setting a feature-specific environment variable.

After some time, the Ray Team enables the feature by default to gather more feedback from the community. In that case, you can still disable the feature using the same environment variable to fully revert to the old behavior.

If you run into issues with experimental features, open an issue on GitHub. The Ray Team considers feedback before removing the old implementation and making the new implementation the default.

Note

Experimental features can undergo frequent changes, especially on the master branch and the nightly wheels.

Context-aware progress reporting

Note

This feature is enabled by default in Ray 2.6.
To disable, set the environment variable RAY_AIR_NEW_OUTPUT=0.

A context-aware output engine is available for Ray Train and Ray Tune runs.

This output engine affects how the training progress is printed in the console. The output changes depending on the execution context: Ray Tune runs will be displayed differently to Ray Train runs.

The features include:

Ray Train runs report status relevant to the single training run. It does not use the default Ray Tune table layout from previous versions.
The table format has been updated.
The format of reporting configurations and observed metrics is different from pervious versions.
Significant reduction in the default metrics displayed in the console output for runs (e.g., RLlib runs).
Decluttered the output to improve readability.

This output feature only works for the regular console. It is automatically disabled when you use Jupyter Notebooks or Ray client.

Rich layout (sticky status)

Note
This feature is disabled by default.
To enable, set the environment variable RAY_AIR_RICH_LAYOUT=1.

The context-aware output engine exposes an advanced layout using the rich library.

The rich layout provides a sticky status table: The regular console logs are still printed as before, but the trial overview table (in Ray Tune) is stuck to the bottom of the screen and periodically updated.

This feature is still in development. You can opt-in to try it out.

To opt-in, set the RAY_AIR_RICH_LAYOUT=1 environment variable and install rich (pip install rich).

Event-based trial execution engine

Note
This feature is enabled by default starting Ray 2.5.
To disable, set the environment variable TUNE_NEW_EXECUTION=0.

Ray Tune has an updated trial execution engine. Since Ray Tune is also the execution backend for Ray Train, the updated engine affects both tuning and training runs.

The update is a refactor of the TrialRunner which uses a generic Ray actor and future manager instead of the previous RayTrialExecutor. This manager exposes an interface to react to scheduling and task execution events, which makes it easier to maintain and develop.

This is a drop-in replacement of an internal class, and you shouldn’t see any change to the previous behavior.

However, if you notice any odd behavior, you can opt out of the event-based execution engine and see if it resolves your problem.

In that case, please open an issue on GitHub, ideally with a reproducible script.

Things to look out for:

Less trials are running in parallel than before
It takes longer to start new trials (or goes much faster)
The tuning run finishes, but the script does not exit
The end-to-end runtime is much slower than before
The CPU load on the head node is high, even though the training jobs don’t require many resources or don’t run on the head node

Any exceptions are raised that indicate an error in starting or stopping trials or the experiment

Note that some edge cases may not be captured in the regression tests. Your feedback is welcome.

The text was updated successfully, but these errors were encountered:

#36950) Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: #36949 Signed-off-by: Kai Fricke <kai@anyscale.com>

ray-project#36950) Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949 Signed-off-by: Kai Fricke <kai@anyscale.com>

ray-project#36950) Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949 Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: 久龙 <guyang.sgy@antfin.com>

ray-project#36950) Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949 Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: harborn <gangsheng.wu@intel.com>

ray-project#36950) Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949 Signed-off-by: Kai Fricke <kai@anyscale.com>

spacegoing · 2023-08-20T09:59:38Z

@krfricke I'm using a customized progress reporter. with new feature enabled, how do I customize it? I couldn't find docs. help pls:D

ray-project#36950) Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949 Signed-off-by: Kai Fricke <kai@anyscale.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

alex3s · 2023-10-24T12:13:13Z

I think that when using RAY_AIR_RICH_LAYOUT=1
It looks great!
One issue is that the numbers are sometimes displayed like this 0.0999999999 instead of 0.1
I suggest changing line 894
in python/ray/tune/experimental/output.py
from this:

            for trial_info in trial_infos:
                table_trial.add_row(*[str(_) for _ in trial_info])

to this:

            for trial_info in trial_infos:
                table_trial.add_row(*[f'{_:.6}' if isinstance(_, float) else str(_) for _ in trial_info])

Voila! they all looking great!

dennymarcels · 2024-02-20T19:26:36Z

Where exactly should I set up the environmental variables? Local machine? Head node? Worker nodes? All of them?

As of Ray 2.6, its backend will auto-detect the context it was called in (whether that be an interactive session or otherwise) and adjust the progress reporter accordingly. This was previously done our end, so the `reporter` argument is no longer needed. See ray-project/ray#36949

spark-tdy · 2024-02-28T20:24:45Z

Where exactly should I set up the environmental variables? Local machine? Head node? Worker nodes? All of them?

Set them as a variable in the ray.init(runtime_env={"env_vars": {"<variable>": "<value>"}})

krfricke pinned this issue Jun 29, 2023

krfricke mentioned this issue Jun 29, 2023

[air/docs] Remove experimental features page, add github issue instead #36950

Merged

8 tasks

richardliaw unpinned this issue Jul 7, 2023

mas-kho mentioned this issue Jul 28, 2023

trial error in experiment ppo_4x4grid LucasAlegre/sumo-rl#149

Open

dihaitz04 mentioned this issue Aug 4, 2023

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-68: character maps to <undefined> #36767

Open

luxunxiansheng mentioned this issue Nov 1, 2023

Distributed training falied due to NCCL communicaiton #40850

Closed

spinezhang mentioned this issue Dec 13, 2023

ray.data.Dataset.iter_tf_batches() change the length of binary type #41856

Closed

letessarini mentioned this issue Dec 22, 2023

AttributeError: 'YOLO' object has no attribute 'tune' ultralytics/ultralytics#7130

Closed

2 tasks

PhilippWillms mentioned this issue Jan 6, 2024

[RLlib][Tune] Usage of gymnasium's passive_env_checker - inconsistent behavior between Rllib and Tune #42223

Open

SnehilJalan mentioned this issue Jan 26, 2024

DeprecationWarning RWTH-E3D/ifcnet-models#4

Open

mct2611 mentioned this issue Jan 30, 2024

[Ray train][Quick start demo] socket.cpp:[c10d] system error: 10049 #42831

Closed

Gabrielle240125 mentioned this issue Feb 4, 2024

[🐛BUG]ray.tune自动调参问题 RUCAIBox/RecBole#1990

Open

martinkim0 mentioned this issue Feb 26, 2024

remove reporter argument in ModelTuner scverse/scvi-tools#2557

Merged

widyanurulhuda mentioned this issue Mar 23, 2024

Yolov8 multigpu tuning ultralytics/ultralytics#7531

Open

1 task

noahvand mentioned this issue Apr 11, 2024

Exception: Horizon h=1 incompatible with seasonality or trend in stacks Nixtla/neuralforecast#955

Closed

camtice mentioned this issue Apr 22, 2024

Chex 1.86 and TF Keras SoyGema/MARL-Melting-pot#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental features in Ray AIR #36949

Experimental features in Ray AIR #36949

krfricke commented Jun 29, 2023

Note

Note

spacegoing commented Aug 20, 2023

alex3s commented Oct 24, 2023

dennymarcels commented Feb 20, 2024

spark-tdy commented Feb 28, 2024

Experimental features in Ray AIR #36949

Experimental features in Ray AIR #36949

Comments

krfricke commented Jun 29, 2023

Experimental features in Ray AIR

Note

Context-aware progress reporting

Note

Rich layout (sticky status)

Event-based trial execution engine

spacegoing commented Aug 20, 2023

alex3s commented Oct 24, 2023

dennymarcels commented Feb 20, 2024

spark-tdy commented Feb 28, 2024