Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental features in Ray AIR #36949

Open
krfricke opened this issue Jun 29, 2023 · 4 comments
Open

Experimental features in Ray AIR #36949

krfricke opened this issue Jun 29, 2023 · 4 comments

Comments

@krfricke
Copy link
Contributor

Experimental features in Ray AIR

The Ray Team is testing a number of experimental features in Ray AIR.

During development, the features are disabled per default. You can opt-in by setting a feature-specific environment variable.

After some time, the Ray Team enables the feature by default to gather more feedback from the community. In that case, you can still disable the feature using the same environment variable to fully revert to the old behavior.

If you run into issues with experimental features, open an issue on GitHub. The Ray Team considers feedback before removing the old implementation and making the new implementation the default.

Note

Experimental features can undergo frequent changes, especially on the master branch and the nightly wheels.

Context-aware progress reporting

Note

This feature is enabled by default in Ray 2.6.
To disable, set the environment variable RAY_AIR_NEW_OUTPUT=0.

A context-aware output engine is available for Ray Train and Ray Tune runs.

This output engine affects how the training progress is printed in the console. The output changes depending on the execution context: Ray Tune runs will be displayed differently to Ray Train runs.

The features include:

  • Ray Train runs report status relevant to the single training run. It does not use the default Ray Tune table layout from previous versions.
  • The table format has been updated.
  • The format of reporting configurations and observed metrics is different from pervious versions.
  • Significant reduction in the default metrics displayed in the console output for runs (e.g., RLlib runs).
  • Decluttered the output to improve readability.

This output feature only works for the regular console. It is automatically disabled when you use Jupyter Notebooks or Ray client.

Rich layout (sticky status)

Note
This feature is disabled by default.
To enable, set the environment variable RAY_AIR_RICH_LAYOUT=1.

The context-aware output engine exposes an advanced layout using the rich library.

The rich layout provides a sticky status table: The regular console logs are still printed as before, but the trial overview table (in Ray Tune) is stuck to the bottom of the screen and periodically updated.

This feature is still in development. You can opt-in to try it out.

To opt-in, set the RAY_AIR_RICH_LAYOUT=1 environment variable and install rich (pip install rich).

Event-based trial execution engine

Note
This feature is enabled by default starting Ray 2.5.
To disable, set the environment variable TUNE_NEW_EXECUTION=0.

Ray Tune has an updated trial execution engine. Since Ray Tune is also the execution backend for Ray Train, the updated engine affects both tuning and training runs.

The update is a refactor of the TrialRunner which uses a generic Ray actor and future manager instead of the previous RayTrialExecutor. This manager exposes an interface to react to scheduling and task execution events, which makes it easier to maintain and develop.

This is a drop-in replacement of an internal class, and you shouldn’t see any change to the previous behavior.

However, if you notice any odd behavior, you can opt out of the event-based execution engine and see if it resolves your problem.

In that case, please open an issue on GitHub, ideally with a reproducible script.

Things to look out for:

  • Less trials are running in parallel than before
  • It takes longer to start new trials (or goes much faster)
  • The tuning run finishes, but the script does not exit
  • The end-to-end runtime is much slower than before
  • The CPU load on the head node is high, even though the training jobs don’t require many resources or don’t run on the head node

Any exceptions are raised that indicate an error in starting or stopping trials or the experiment

Note that some edge cases may not be captured in the regression tests. Your feedback is welcome.

@krfricke krfricke pinned this issue Jun 29, 2023
krfricke added a commit that referenced this issue Jun 30, 2023
#36950)

Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: #36949

Signed-off-by: Kai Fricke <kai@anyscale.com>
edoakes pushed a commit to edoakes/ray that referenced this issue Jun 30, 2023
ray-project#36950)

Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949

Signed-off-by: Kai Fricke <kai@anyscale.com>
@richardliaw richardliaw unpinned this issue Jul 7, 2023
SongGuyang pushed a commit to alipay/ant-ray that referenced this issue Jul 12, 2023
ray-project#36950)

Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: 久龙 <guyang.sgy@antfin.com>
harborn pushed a commit to harborn/ray that referenced this issue Aug 17, 2023
ray-project#36950)

Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: harborn <gangsheng.wu@intel.com>
harborn pushed a commit to harborn/ray that referenced this issue Aug 17, 2023
ray-project#36950)

Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949

Signed-off-by: Kai Fricke <kai@anyscale.com>
@spacegoing
Copy link
Contributor

@krfricke I'm using a customized progress reporter. with new feature enabled, how do I customize it? I couldn't find docs. help pls:D

arvind-chandra pushed a commit to lmco/ray that referenced this issue Aug 31, 2023
ray-project#36950)

Instead of tracking the experimental features in the docs, we will track them in this pinned issue instead: ray-project#36949

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
@alex3s
Copy link

alex3s commented Oct 24, 2023

I think that when using RAY_AIR_RICH_LAYOUT=1
It looks great!
One issue is that the numbers are sometimes displayed like this 0.0999999999 instead of 0.1
I suggest changing line 894
in python/ray/tune/experimental/output.py
from this:

            for trial_info in trial_infos:
                table_trial.add_row(*[str(_) for _ in trial_info])

to this:

            for trial_info in trial_infos:
                table_trial.add_row(*[f'{_:.6}' if isinstance(_, float) else str(_) for _ in trial_info])

Voila! they all looking great!

@dennymarcels
Copy link

Where exactly should I set up the environmental variables? Local machine? Head node? Worker nodes? All of them?

martinkim0 added a commit to scverse/scvi-tools that referenced this issue Feb 26, 2024
As of Ray 2.6, its backend will auto-detect the context it was called in
(whether that be an interactive session or otherwise) and adjust the
progress reporter accordingly. This was previously done our end, so the
`reporter` argument is no longer needed. See
ray-project/ray#36949
@spark-tdy
Copy link

Where exactly should I set up the environmental variables? Local machine? Head node? Worker nodes? All of them?

Set them as a variable in the ray.init(runtime_env={"env_vars": {"<variable>": "<value>"}})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants