[Data] [Doc] Add Ray Data Execution Configurations doc page #44105

scottjlee · 2024-03-18T23:40:57Z

Why are these changes needed?

Add a page to describe the various configurations for Ray Data from ExecutionOptions and DataContext.

New page: https://anyscale-ray--44105.com.readthedocs.build/en/44105/data/execution-configurations.html

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Scott Lee <sjl@anyscale.com>

bveeramani · 2024-03-19T01:00:28Z

doc/source/data/execution-configurations.rst

+Ray Data provides a number of configurations that can be used to control various aspects
+of Ray Dataset execution. These configurations can be modified by the user using


Suggested change

Ray Data provides a number of configurations that can be used to control various aspects

of Ray Dataset execution. These configurations can be modified by the user using

Ray Data provides a number of configurations that control various aspects

of Ray Dataset execution. You can modify these configurations by using

bveeramani · 2024-03-19T01:00:56Z

doc/source/data/execution-configurations.rst

+===============================================
+
+The :class:`~ray.data.ExecutionOptions` class is used to configure options during Ray Dataset execution.
+To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example:


Suggested change

To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example:

To use it, modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example:

bveeramani · 2024-03-19T01:02:41Z

doc/source/data/execution-configurations.rst

+.. code-block::
+
+   ctx = ray.data.DataContext.get_current()
+   ctx.execution_options.verbose_progress = True


Not sure ifcode-block will syntax highlight correctly.

Suggested change

.. code-block::

ctx = ray.data.DataContext.get_current()

ctx.execution_options.verbose_progress = True

.. testcode::

:hide:

import ray

.. testcode::

ctx = ray.data.DataContext.get_current()

ctx.execution_options.verbose_progress = True

bveeramani · 2024-03-19T01:03:48Z

doc/source/data/execution-configurations.rst

+   ctx = ray.data.DataContext.get_current()
+   ctx.execution_options.verbose_progress = True
+
+* `resource_limits`: Set a soft limit on the resource usage during execution. Auto-detected by default.


When would you want to set a limit on resource usage?

added an example of such a case:

For example, if there are other parts of the code which require some minimum amount of resources, you may want to limit the amount of resources that Ray Data uses.

For my own understanding, when would you want to use resorce_limits over exclude_resources if you have other code that uses resources?

i think they are two ways of controlling the same overall concept. one is an exclusion of resources used by non-ray data workload, while the other is a cap on data resources.

bveeramani · 2024-03-19T01:03:59Z

doc/source/data/execution-configurations.rst

+* `resource_limits`: Set a soft limit on the resource usage during execution. Auto-detected by default.
+* `exclude_resources`: Amount of resources to exclude from Ray Data. Set this if you have other workloads running on the same cluster. Note: 
+
+  * If using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default.


Suggested change

* If using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default.

* If you're using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default.

bveeramani · 2024-03-19T01:06:15Z

doc/source/data/execution-configurations.rst

+Configuring :class:`~ray.data.DataContext`
+==========================================
+The :class:`~ray.data.DataContext` class is used to configure more general options for Ray Data usage, such as observability/logging options,
+error handling/retry behavior, and internal data formats. To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object. For example:


Suggested change

error handling/retry behavior, and internal data formats. To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object. For example:

error handling/retry behavior, and internal data formats. To use it, modify the attributes in the current :class:`~ray.data.DataContext` object. For example:

bveeramani · 2024-03-19T01:07:13Z

doc/source/data/execution-configurations.rst

+.. code-block::
+
+   ctx = ray.data.DataContext.get_current()
+   ctx.verbose_stats_logs = True


Suggested change

.. code-block::

ctx = ray.data.DataContext.get_current()

ctx.verbose_stats_logs = True

.. testcode::

:hide:

import ray

.. testcode::

ctx = ray.data.DataContext.get_current()

ctx.verbose_stats_logs = True

bveeramani · 2024-03-19T01:08:00Z

doc/source/data/execution-configurations.rst

+Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or for debugging purposes, 
+and most users should not need to modify them. However, some of the most important options are:


Suggested change

Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or for debugging purposes,

and most users should not need to modify them. However, some of the most important options are:

Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or debugging,

and most users shouldn't need to modify them. However, some of the most important options are:

bveeramani · 2024-03-19T01:08:34Z

doc/source/data/execution-configurations.rst

+* `verbose_stats_logs`: Whether stats logs should be verbose. This includes fields such as ``extra_metrics`` in the stats output, which are excluded by default. Off by default.
+* `log_internal_stack_trace_to_stdout`: Whether to include internal Ray Data/Ray Core code stack frames when logging to ``stdout``. The full stack trace is always written to the Ray Data log file. Off by default.
+
+For more details on each of the preceding options, see the API documentation for :class:`~ray.data.DataContext`.


Suggested change

For more details on each of the preceding options, see the API documentation for :class:`~ray.data.DataContext`.

For more details on each of the preceding options, see :class:`~ray.data.DataContext`.

bveeramani · 2024-03-19T01:09:43Z

doc/source/data/user-guide.rst

@@ -17,6 +17,7 @@ show you how achieve several tasks.
    inspecting-data
    iterating-over-data
    saving-data
+    execution-configurations


Wondering if we should move this lower since the page seems advanced. Wdyt?

good point, moved the section after "working with X" sections. i didn't want to label the page explicitly as "advanced" since some of these options (verbose logging, max_errored_blocks) can be pretty useful for simple use cases like reading files from S3.

Signed-off-by: Scott Lee <sjl@anyscale.com>

…ect#44105) Add a page to describe the various configurations for Ray Data from ExecutionOptions and DataContext. Signed-off-by: Scott Lee <sjl@anyscale.com>

…44172) Cherry-pick #44105. Docs-only change. Add a page to describe the various configurations for Ray Data from ExecutionOptions and DataContext. Signed-off-by: Scott Lee <sjl@anyscale.com>

…ect#44105) Add a page to describe the various configurations for Ray Data from ExecutionOptions and DataContext. Signed-off-by: Scott Lee <sjl@anyscale.com>

scottjlee added 3 commits March 18, 2024 16:39

add page

11abfd5

Signed-off-by: Scott Lee <sjl@anyscale.com>

docs vale

a7ef692

Signed-off-by: Scott Lee <sjl@anyscale.com>

update title

c5b592f

Signed-off-by: Scott Lee <sjl@anyscale.com>

scottjlee marked this pull request as ready for review March 19, 2024 00:55

scottjlee requested review from ericl, scv119, c21, amogkam, bveeramani, raulchen, stephanie-wang and omatthew98 as code owners March 19, 2024 00:55

scottjlee assigned c21, omatthew98 and bveeramani Mar 19, 2024

bveeramani reviewed Mar 19, 2024

View reviewed changes

scottjlee added 3 commits March 19, 2024 12:12

comments

7727748

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into 0318-configdoc

3ce0778

Signed-off-by: Scott Lee <sjl@anyscale.com>

fix title length

ba3eece

Signed-off-by: Scott Lee <sjl@anyscale.com>

scottjlee requested a review from bveeramani March 20, 2024 00:23

bveeramani approved these changes Mar 20, 2024

View reviewed changes

bveeramani merged commit 8cc3c0c into ray-project:master Mar 20, 2024
5 checks passed

scottjlee mentioned this pull request Mar 20, 2024

[Data] [Docs] Cherry-pick #44105 #44172

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] [Doc] Add Ray Data Execution Configurations doc page #44105

[Data] [Doc] Add Ray Data Execution Configurations doc page #44105

scottjlee commented Mar 18, 2024 •

edited

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

scottjlee Mar 19, 2024

bveeramani Mar 19, 2024

scottjlee Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

bveeramani Mar 19, 2024

scottjlee Mar 19, 2024

		Ray Data provides a number of configurations that can be used to control various aspects
		of Ray Dataset execution. These configurations can be modified by the user using

	To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example:
	To use it, modify the attributes in the current :class:`~ray.data.DataContext` object's `execution_options`. For example:

	* If using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default.
	* If you're using Ray Data with Ray Train, training resources are automatically excluded. Otherwise, off by default.

	error handling/retry behavior, and internal data formats. To use it, you can modify the attributes in the current :class:`~ray.data.DataContext` object. For example:
	error handling/retry behavior, and internal data formats. To use it, modify the attributes in the current :class:`~ray.data.DataContext` object. For example:

		Many of the options in :class:`~ray.data.DataContext` are intended for advanced use cases or for debugging purposes,
		and most users should not need to modify them. However, some of the most important options are:

	For more details on each of the preceding options, see the API documentation for :class:`~ray.data.DataContext`.
	For more details on each of the preceding options, see :class:`~ray.data.DataContext`.

[Data] [Doc] Add Ray Data Execution Configurations doc page #44105

[Data] [Doc] Add Ray Data Execution Configurations doc page #44105

Conversation

scottjlee commented Mar 18, 2024 • edited

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottjlee commented Mar 18, 2024 •

edited