New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] [Doc] Add Ray Data Execution Configurations doc page #44105
Conversation
Ray Data provides a number of configurations that can be used to control various aspects | ||
of Ray Dataset execution. These configurations can be modified by the user using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ray Data provides a number of configurations that can be used to control various aspects | |
of Ray Dataset execution. These configurations can be modified by the user using | |
Ray Data provides a number of configurations that control various aspects | |
of Ray Dataset execution. You can modify these configurations by using |
=============================================== | ||
|
||
The :class:`~ray.data.ExecutionOptions` class is used to configure options during Ray Dataset execution. | ||
To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example: | |
To use it, modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example: |
.. code-block:: | ||
|
||
ctx = ray.data.DataContext.get_current() | ||
ctx.execution_options.verbose_progress = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure ifcode-block
will syntax highlight correctly.
.. code-block:: | |
ctx = ray.data.DataContext.get_current() | |
ctx.execution_options.verbose_progress = True | |
.. testcode:: | |
:hide: | |
import ray | |
.. testcode:: | |
ctx = ray.data.DataContext.get_current() | |
ctx.execution_options.verbose_progress = True |
ctx = ray.data.DataContext.get_current() | ||
ctx.execution_options.verbose_progress = True | ||
|
||
* `resource_limits`: Set a soft limit on the resource usage during execution. Auto-detected by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would you want to set a limit on resource usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an example of such a case:
For example, if there are other parts of the code which require some minimum amount of resources, you may want to limit the amount of resources that Ray Data uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding, when would you want to use resorce_limits
over exclude_resources
if you have other code that uses resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think they are two ways of controlling the same overall concept. one is an exclusion of resources used by non-ray data workload, while the other is a cap on data resources.
* `resource_limits`: Set a soft limit on the resource usage during execution. Auto-detected by default. | ||
* `exclude_resources`: Amount of resources to exclude from Ray Data. Set this if you have other workloads running on the same cluster. Note: | ||
|
||
* If using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default. | |
* If you're using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default. |
Configuring :class:`~ray.data.DataContext` | ||
========================================== | ||
The :class:`~ray.data.DataContext` class is used to configure more general options for Ray Data usage, such as observability/logging options, | ||
error handling/retry behavior, and internal data formats. To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error handling/retry behavior, and internal data formats. To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object. For example: | |
error handling/retry behavior, and internal data formats. To use it, modify the attributes in the current :class:`~ray.data.DataContext` object. For example: |
.. code-block:: | ||
|
||
ctx = ray.data.DataContext.get_current() | ||
ctx.verbose_stats_logs = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. code-block:: | |
ctx = ray.data.DataContext.get_current() | |
ctx.verbose_stats_logs = True | |
.. testcode:: | |
:hide: | |
import ray | |
.. testcode:: | |
ctx = ray.data.DataContext.get_current() | |
ctx.verbose_stats_logs = True |
Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or for debugging purposes, | ||
and most users should not need to modify them. However, some of the most important options are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or for debugging purposes, | |
and most users should not need to modify them. However, some of the most important options are: | |
Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or debugging, | |
and most users shouldn't need to modify them. However, some of the most important options are: |
* `verbose_stats_logs`: Whether stats logs should be verbose. This includes fields such as ``extra_metrics`` in the stats output, which are excluded by default. Off by default. | ||
* `log_internal_stack_trace_to_stdout`: Whether to include internal Ray Data/Ray Core code stack frames when logging to ``stdout``. The full stack trace is always written to the Ray Data log file. Off by default. | ||
|
||
For more details on each of the preceding options, see the API documentation for :class:`~ray.data.DataContext`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more details on each of the preceding options, see the API documentation for :class:`~ray.data.DataContext`. | |
For more details on each of the preceding options, see :class:`~ray.data.DataContext`. |
doc/source/data/user-guide.rst
Outdated
@@ -17,6 +17,7 @@ show you how achieve several tasks. | |||
inspecting-data | |||
iterating-over-data | |||
saving-data | |||
execution-configurations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should move this lower since the page seems advanced. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, moved the section after "working with X" sections. i didn't want to label the page explicitly as "advanced" since some of these options (verbose logging, max_errored_blocks) can be pretty useful for simple use cases like reading files from S3.
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
…ect#44105) Add a page to describe the various configurations for Ray Data from ExecutionOptions and DataContext. Signed-off-by: Scott Lee <sjl@anyscale.com>
…ect#44105) Add a page to describe the various configurations for Ray Data from ExecutionOptions and DataContext. Signed-off-by: Scott Lee <sjl@anyscale.com>
Why are these changes needed?
Add a page to describe the various configurations for Ray Data from
ExecutionOptions
andDataContext
.New page: https://anyscale-ray--44105.com.readthedocs.build/en/44105/data/execution-configurations.html
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.