[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

sven1977 · 2024-05-01T16:12:10Z

Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger).

This PR introduces a new example script (and adds it to the CI), which

defines a custom callback
renders the env on all EnvRunners and stores images temporarily inside the episode
compiles a video of the finished episode
logs the video of the best and worst performing episodes (per iteration) via the MetricsLogger available on the EnvRunners
Shows how the videos can be viewed through a simple WandB (Tune) setup.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…lete_metrics_and_stats_do_over

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…lete_metrics_and_stats_do_over

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…lete_metrics_and_stats_do_over

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder_03

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

… cleanup_examples_folder_03

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ics_do_over_04_env_rendering_example_script Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/package_ref/learner.rst # rllib/utils/metrics/metrics_logger.py # rllib/utils/metrics/stats.py

aslonnie · 2024-05-02T01:44:21Z

does not seem to require doc owners approval. I am adding @angelinalg explicitly (if this needs review on the example's wording)

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ics_do_over_04_env_rendering_example_script

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. A couple of questions. Some nits and nuts.

simonsays1980 · 2024-05-02T11:24:22Z

rllib/examples/envs/custom_env_render_method.py

+        done = self.cur_pos >= self.end_pos or truncated
+        return [self.cur_pos], 10.0 if done else -0.1, done, truncated, {}
+
+    def render(self, mode="rgb"):


Is there any particular reason why we diverge here from the standard gymnasium API? The API does

Not receive any parameters anymore, i.e. def render(self)

Does return type: RenderFrame | list[RenderFrame] | None but not bool.

So sorry, this was a left-over from the next PR. This entire script has been re-done and packed into a separate PR. Removed from this one.

simonsays1980 · 2024-05-02T11:29:37Z

rllib/examples/envs/env_rendering_and_recording.py

+        """
+        # If we have a vector env, only render the sub-env at index 0.
+        if isinstance(env.unwrapped, gym.vector.VectorEnv):
+            image = env.envs[0].render()


Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3)) with K the number of images?

And if we have a vector environment, how does this look like in Tensorboard?

I like however that we use the standard API of gymnasium here and the user just needs to override this.

Yes, me, too. This was a total mess in the old RLlib, with the render_env option.

Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3)) with K the number of images?

Correct (I think this is even from your own PR a while back, where you enabled the WandB logger to actually log images and videos):

shape=3D -> single image e.g. [c, h, w]

shape=4D -> n images (all of the same size) e.g. [N, c, h, w]

shape=5D -> 1 video with shape [L, c, h, w]

c=channels, L=length of video, w=width, h=height

And if we have a vector environment, how does this look like in Tensorboard?

No idea. If you chose reduce=None, then all logged videos/images will be placed in a list(!), not batched and thus WandB will display them as separate images/videos. This having a list (as opposed to a batched array) is a must for videos as they may have different lengths.

simonsays1980 · 2024-05-02T11:31:48Z

rllib/examples/envs/env_rendering_and_recording.py

+            # Create a video from the images by simply stacking them.
+            video = np.expand_dims(
+                np.stack(images, axis=0), axis=0
+            )  # TODO: test to make sure WandB properly logs videos.


This TODO is important. And we should ping the Tune team to take a look into the WandB logger when using schedulers or large checkpoints.

Ah, no, this works already totally fine. Let me remove this ....

removed confusing TODO.

simonsays1980 · 2024-05-02T11:32:09Z

rllib/examples/envs/env_rendering_and_recording.py

-        # an example on how to use a Viewer object.
-        return np.random.randint(0, 256, size=(300, 400, 3), dtype=np.uint8)
+            # Create a video from the images by simply stacking them.
+            video = np.expand_dims(


Can we add a small note, describing the shape of the array?

simonsays1980 · 2024-05-02T11:34:02Z

rllib/examples/envs/env_rendering_and_recording.py

+        # Best video.
+        metrics_logger.log_value(
+            "episode_videos_best",
+            self.best_episode_and_return[0],


More sense to me makes to log the best of the best and the worst of the horst.

Images alone are already large, videos are larger and then multiple of them ... my machine would probably not make it well.

Yeah, that's why I crunch them down to 64 x 86. True, we should maybe further reduce them on the Algorithm side after the EnvRunner each return their best, but that would require another callback that goes through these videos, compares, filters, etc..
I wanted to avoid this complexity. One could also simply log only the videos from one hard-coded EnvRunner (e.g. index=1) and reduce the number of videos this way. Or only log every nth iteration, etc..

simonsays1980 · 2024-05-02T11:35:47Z

rllib/examples/envs/env_rendering_and_recording.py

+            clear_on_reduce=True,
+        )
+        # Worst video.
+        metrics_logger.log_value(


The log_value at first confused me as I was expecting a single scalar value to be logged, but in comparison to the other logging methods, this makes sense, as we have only a single "variable"

Yeah, our Tune logging API does NOT allow for specifying any data types (videos, images, etc..) this early. We have to "communicate" with it via the tensor format/shape. So there is only really log_valuenothing else. log_dict and log_n_dicts are convenience methods to avoid having to call log_value a dozen times or so.

simonsays1980 · 2024-05-02T11:38:00Z

rllib/env/multi_agent_episode.py

@@ -278,6 +278,10 @@ def __init__(
            [] if render_images is None else render_images
        )

+        # Caches for temporary per-timestep data. May be used to store custom metrics
+        # from within a callback for the ongoing episode (e.g. render images).
+        self._temporary_timestep_data = defaultdict(list)


Awesome, so we have our old episode.media somehow back :)

Haha, yeah, there had to be some temporary data cache available to the users. One possible way to avoid having this at all in the episode would be to tell the user to store these in their custom callbacks instance directly and take care of keeping these clean, but I'm not sure yet. We might deprecate this again, if it turns out that that's the more transparent solution. What I like about the episode storage is that it auto-clears itself once the episode is finalized, making sure the user cannot dump infinite data into an episode and cause leaks.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…hrough custom callbacks using MetricsLogger). (ray-project#45073)

…hrough custom callbacks using MetricsLogger). (ray-project#45073) Signed-off-by: pdmurray <peynmurray@gmail.com>

…hrough custom callbacks using MetricsLogger). (ray-project#45073) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

…hrough custom callbacks using MetricsLogger). (ray-project#45073)

sven1977 and others added 30 commits March 13, 2024 12:50

wip

6f1b505

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b6e2714

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into comp…

4dfb2ce

…lete_metrics_and_stats_do_over

wip

e6402c6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into comp…

33487cc

…lete_metrics_and_stats_do_over

doctest fix

a02abbd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

d9f3e6e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e909a73

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

bdaa04c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

52d9e12

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

f77ffdb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

81c4c79

…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1672675

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

adf9e8c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c9e5c2f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

683bc4b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

5bd220f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

d931945

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e9888de

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

0e97d8f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

7584cce

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into comp…

5ba69af

…lete_metrics_and_stats_do_over

wip

36bfa57

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

5565d4f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

27c793d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

d75b31a

…nup_examples_folder_03

fixes

bf9cef0

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Apply suggestions from code review

872b49b

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

fixes

743dabd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/cleanup_examples_folder_03' into…

d50a39c

… cleanup_examples_folder_03

sven1977 assigned simonsays1980 and sven1977 and unassigned sven1977 May 1, 2024

sven1977 marked this pull request as ready for review May 1, 2024 16:56

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha, simonsays1980 and a team as code owners May 1, 2024 16:56

sven1977 added 2 commits May 1, 2024 21:13

wip

a86d05c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

aslonnie requested review from angelinalg and removed request for a team May 2, 2024 01:43

sven1977 added 3 commits May 2, 2024 12:07

wip

f7398dc

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into metr…

4be4d12

…ics_do_over_04_env_rendering_example_script

wip

156b8c6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes May 2, 2024

View reviewed changes

sven1977 added 5 commits May 2, 2024 14:19

LINT

ab4ddba

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes and LINT

1d99de8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes and LINT

d242b14

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

d08c0e5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

311ea3f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit afba35d into ray-project:master May 2, 2024
5 checks passed

sven1977 deleted the metrics_do_over_04_env_rendering_example_script branch May 2, 2024 15:35

harborn pushed a commit to harborn/ray that referenced this pull request May 8, 2024

[RLlib] Metrics do-over 04: New env rendering/video example script (t…

9d33dae

…hrough custom callbacks using MetricsLogger). (ray-project#45073)

ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024

[RLlib] Metrics do-over 04: New env rendering/video example script (t…

742ffa7

…hrough custom callbacks using MetricsLogger). (ray-project#45073)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

sven1977 commented May 1, 2024 •

edited

aslonnie commented May 2, 2024

simonsays1980 left a comment

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024

sven1977 May 2, 2024 •

edited

simonsays1980 May 2, 2024

sven1977 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024 •

edited

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

Conversation

sven1977 commented May 1, 2024 • edited

Why are these changes needed?

Related issue number

Checks

aslonnie commented May 2, 2024

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 May 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 May 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented May 1, 2024 •

edited

sven1977 May 2, 2024 •

edited

sven1977 May 2, 2024 •

edited