[pull] master from ray-project:master #25

pull · 2023-05-31T05:30:46Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

) #35272 changed the implementation of _convert_outputs_to_ndarray_batch, but didn't update the type annotation or docstring. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

…when needed (#35475) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…nc actors (#35828) Signed-off-by: Alan Guo <aguo@anyscale.com>

Thanks @iycheng for debugging this! In #33976 we don't always guarantee that children of worker processes are killed -- namely if the worker dies in an unexpected way, they are not (see #26118). This unfortunately happens in the `test_no_worker_child_process_leaks` test case since the Ray cluster is torn down when the driver exits, which can lead to worker processes being forcefully terminated [the solution in this PR is to not initialize the Ray cluster in the driver script but start a separate head node with `.add_node`]. As a result of this forceful killing, the test becomes flaky (it passes if the workers are killed through the normal termination, and it doesn't pass if the workers are killed otherwise). The flakiness is not super big right now but becomes big with some other changes. This should eventually be fixed with #26118.

…35649) Why are these changes needed? This allows the autoscaler to pass cloud instance id into Ray.

…additional update method. (#35385)

When tune runs in a windows session with legacy encoding (e.g. cp-1252), the characters in the new output will result in an encoding error: ``` File "c:\install\ray\python\ray\tune\experimental\output.py", line 700, in experiment_started print( File "C:\Miniconda3\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-72: character maps to <undefined> ``` This PR detects a non-utf encoding and reverts to ascii characters which are fully compatible with windows. This is equivalent to the old Tune output format: ``` +------------------------------+----------+-------+ | Trial name | status | loc | |------------------------------+----------+-------| | LightningTrainer_47320_00000 | PENDING | | +------------------------------+----------+-------+ ``` The PR also adds a small fix in tune controller that came up when debugging this issue on windows. Signed-off-by: Kai Fricke <kai@anyscale.com>

…odule sub-classes, flip inheritance order and remove c'tors. (#35659)

…s. (#35258)

Make serve.batch compatible with generators Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

This PR switches our doctest tooling from sphinx.ext.doctest to pytest-sphinx. Context We want to test all code snippets in CI. testcode and doctest examples are tested in CI with sphinx.ext.doctest. But, we often write code-block:: python examples, and they aren't tested in CI. Problem sphinx.ext.doctest doesn't let you test specific files. As a result, it's hard to update examples or isolate GPU examples. Solution Migrate to pytest-sphinx. Two reasons: Lets us run GPU doctests Enables developers to quickly test their examples --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

…d). (#35878)

The "Custom datasource" guide isn't really a guide -- it's an example of how to implement a specific custom datasource. Accordingly, I've moved the document to the examples section. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>

We previously exposed three types of datasets: simple, tensor, and tabular. to_tf only supported tabular datasets, and we validated that this was the case. Now that there's only one type of dataset (tabular), the validation code isn't necessary. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

We import urllib.parse and pathlib in multiple functions. It's code duplication, and because the packages are built-in, there's no reason to lazily import them. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

Subclassing object isn't necessary in Python 3. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

…ts exclusive (#35735)" (#35891) This reverts commit 3d124cc.

Signed-off-by: Jun Gong <jungong@anyscale.com>

…esults (#35858) Signed-off-by: Avnish <avnishnarayan@gmail.com>

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

This PR reverts #35426, adding back in a nice button for disconnecting (i.e. calling `ray.shutdown()`) when `ray.init()` is called in a notebook. ## Testing The main issue with the previous disconnect PR is that checking for the `ipywidgets` soft dependency at run time introduces a small performance penalty which bumps some test suites up beyond their timeout upper bound, breaking them. This happens in tests unrelated to where these changes were made previously, making troubleshooting more difficult. Here, I've introduced some optimizations to avoid this penalty wherever possible: * In `ray.widgets.util.in_notebook`, a significant performance penalty was previously being incurred by the `try-import/except` block upon each call. Now, we just check whether `"IPython" in sys.modules` is `True` before attempting an import. This is _much_ faster as `IPython` is pre-loaded for IPython kernels. I've also added this optimization to `repr_fallback_if_colab`. * Secondly, `ensure_notebook_deps` now does dependency checking at function definition time, rather than on each function call. * I've also reworked the logic for detecting the current shell, since there's really only one situation where displaying widgets is okay (when the user is running Jupyterlab). In all other cases, we fall back to simple reprs. * The changes here also solve #35490. Here's the artifact from the CI run below loaded onto google colab: ![image](https://github.com/ray-project/ray/assets/14017872/c1852128-9dce-4744-8bbc-1ed641ed9cce) The rendered output has no dashboard URL row (evidently this isn't available when run on colab), but the HTML repr looks okay. The only graphical issue that I see is that colab is doing something weird to the "RAY" text that is supposed to appear by the Ray logo. Here's what this looks like in Jupyterlab: ![image](https://github.com/ray-project/ray/assets/14017872/7592f5f0-46d7-4fa3-8996-86371d39f3e2)

## Why are these changes needed? This PR is the 1st part of enabling optimizer by default (split from #34937). - Fix inconsistent behaviors for the Read op by reusing the `ReadTask`s from `read_api.py` in `plan_read_op.py`. - Support cache in `materialize`. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in #34937 CI). ## Related issue number #32596

With this PR, actor tasks logs should be retrieved with actor id by default. Users could turn on the flag RAY_ENABLE_RECORD_ACTOR_TASK_LOGGING so that actor logs could be retrieved using task id. This is needed to prevent regression since recording task logs is currently expensive due to Ray flushing all log lines for log monitoring.

…35877)

Next step in #34393 --------- Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

…n core user guide. (#35760) We have GAed some of the State API related features (CLI) - and promoting its usage in core related user gudies.

…] ( #34896) (#35917) This PR seems to be the root cause of stress_test_many_tasks and "1:n async-actor calls async per second" (10%).

…des. (#35884) This PR uses autoscaler-emitted metrics for pending/active/failed nodes. The previous implementation was buggy and wasn't properly cleaned up, which displays the incorrect graph to the dashboard. This PR keeps the backward compatibility by only changing namespace of metrics that we will use from the dashboard. I didn't touch other metrics because they are not used anywhere anyway (we can probably remove them or rethink in the future).

Reverts #35745 after fixing windows encoding issue. Signed-off-by: Kai Fricke <kai@anyscale.com>

…ng / (#35798)

#35286 broke CI. These issues weren't caught earlier because the PR was out of date with master.

added unit test for ray installer. also add helper function to NodeProviderConfig.

bveeramani and others added 29 commits May 26, 2023 17:00

[Data] Correct TorchDetectionPredictor type hint and docstring (#35829

97f6e93

) #35272 changed the implementation of _convert_outputs_to_ndarray_batch, but didn't update the type annotation or docstring. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

[Templates] Remove runtime_env usage and use a custom docker image …

11370b6

…when needed (#35475) Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Add link to actor logs and message that task logs don't exist for asy…

35cdd25

…nc actors (#35828) Signed-off-by: Alan Guo <aguo@anyscale.com>

Fix lint (#35844)

0e22d21

[core][autoscaler] GCS Autoscaler V2: Add instance id to ray [3/x] (#…

f2ce0f7

…35649) Why are these changes needed? This allows the autoscaler to pass cloud instance id into Ray.

[RLlib] DreamerV3: Learner API classes for tf-keras, loss functions, …

1c636e7

…additional update method. (#35385)

[RLlib] Fix duplicate setup() in RLModules; In framework-specific RLM…

141f71e

…odule sub-classes, flip inheritance order and remove c'tors. (#35659)

[RLlib] APPO/PPO Atari release tests: Add new API stack learning test…

1e9669e

…s. (#35258)

[serve] Support generators in @serve.batch (#35729)

33adc0e

Make serve.batch compatible with generators Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

[RLlib] Load state from load_state_path for rlmodule spec. (#35180)

83179ab

[ci][byod/8] control the list of pip through source control (#35539)

7f1f0ed

[RLlib] Deprecate numpy.concat_aligned(). (#34764)

44de9b7

[RLlib] Remove policy_config property from RolloutWorker (not neede…

e5d8b28

…d). (#35878)

[Data] Import built-in packages at module level (#35852)

4105241

We import urllib.parse and pathlib in multiple functions. It's code duplication, and because the packages are built-in, there's no reason to lazily import them. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

[Data] Remove unnecessary object base class (#35855)

8c3e0b3

Subclassing object isn't necessary in Python 3. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

Revert "[RLlib] Add more gpu/multicpu jobs, make supported spaces tes…

afdfb3e

…ts exclusive (#35735)" (#35891) This reverts commit 3d124cc.

Test a few code snippets contained in ray/tune doc strings. (#35380)

95d0772

Signed-off-by: Jun Gong <jungong@anyscale.com>

[RLlib] Make learner group with non blocking update return multiple r…

7d52c2f

…esults (#35858) Signed-off-by: Avnish <avnishnarayan@gmail.com>

[RLlib] Fix flakiness of "supported spaces" tests. (#35902)

04eabd8

[RLlib] Fix rest of PPO RL Modules tests (#35672)

5b5d83c

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>

[RLlib] Learner API enhancements and cleanups (prep. for DreamerV3). (#…

f1f714c

…35877)

pull bot requested a review from raulchen as a code owner May 31, 2023 05:30

[Core] Port GcsSubscriber to Cython (#35094)

db64a12

Next step in #34393 --------- Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>

pull bot added the ⤵️ pull label May 31, 2023

rickyyx and others added 7 commits May 30, 2023 23:51

[doc][core][dashboard][state] Promote state api and dashboard usage i…

4056da5

…n core user guide. (#35760) We have GAed some of the State API related features (CLI) - and promoting its usage in core related user gudies.

Revert "[core][dashboard] Task backend GC policy - worker update [1/3…

5f6f21b

…] ( #34896) (#35917) This PR seems to be the root cause of stress_test_many_tasks and "1:n async-actor calls async per second" (10%).

Revert "Revert "[air/output] Switch on per default (#35389)"" (#35895)

444f2c2

Reverts #35745 after fixing windows encoding issue. Signed-off-by: Kai Fricke <kai@anyscale.com>

[serve] Add proper error and messages for route prefix missing starti…

081a4ba

…ng / (#35798)

Fix CI failures (#35910)

827db0d

#35286 broke CI. These issues weren't caught earlier because the PR was out of date with master.

[autoscaler v2] test ray-installer (#35875)

68c11a8

added unit test for ray installer. also add helper function to NodeProviderConfig.

raulchen closed this May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ray-project:master #25

[pull] master from ray-project:master #25

pull bot commented May 31, 2023 •

edited

Loading

[pull] master from ray-project:master #25

[pull] master from ray-project:master #25

Conversation

pull bot commented May 31, 2023 • edited Loading

pull bot commented May 31, 2023 •

edited

Loading