[RLlib] New ConnectorV2 API #02: SingleAgentEpisode enhancements. #41075

sven1977 · 2023-11-10T14:38:02Z

This PR is the 2nd in the "enhanced/new ConnectorV2 API" series:

It enhances the SingleAgentEpisode class; more consistent API names and additional convenience getter APIs for obs, actions, etc.; removes state property from Episodes (now just another extra_model_output subkey).
added concept of a "lookback buffer" inside an ongoing episode to prepare enabling connectors to look at any data users would like (and replace the trajectory view API).
cleans up some minor things in the code.
Allow for nested action- and obs spaces in episodes.
Removed SingleAgentGymEnvRunner (only used for testing; mostly the same as our now-standard SingleAgenEnvRunner). Merged and activated test cases.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2023-11-14T11:40:22Z

rllib/env/single_agent_episode.py

@@ -16,7 +20,6 @@ def __init__(
        actions: List[ActType] = None,
        rewards: List[SupportsFloat] = None,
        infos: List[Dict] = None,
-        states=None,


State outputs are no longer needed as a separate field. They are treated just like any other extra model outputs (e.g. as a (possibly nested) dict under the STATE_OUT key).

sven1977 · 2023-11-14T11:41:55Z

rllib/env/single_agent_episode.py

@@ -101,8 +102,7 @@ def __init__(
        self.t = self.t_started = (
            t_started if t_started is not None else max(len(self.observations) - 1, 0)
        )
-        if self.t_started < len(self.observations) - 1:
-            self.t = len(self.observations) - 1
+        self._len_pre_buffer = len(self.rewards)


Added concept of a "lookback buffer" inside an ongoing episode.
This allows for custom connectors to look back at previous data until a certain (user defined) amount of timesteps, e.g. to be able to add "prev. rewards", "prev. 5 actions", etc.. to a model's input (via custom connectors).

sven1977 · 2023-11-14T11:43:27Z

rllib/env/single_agent_episode.py

@@ -128,7 +126,7 @@ def concat_episode(self, episode_chunk: "SingleAgentEpisode"):
            from both episodes.
        """
        assert episode_chunk.id_ == self.id_
-        assert not self.is_done
+        assert not self.is_done and not self.is_numpy


For simplicity, we assume that Episode is still in the "list-format" (not numpyized yet).
We might have to change this concat_episode() API in the future, but right now, it's only used inside DreamerV3's replay buffer anyways (and in some test cases).

sven1977 · 2023-11-14T11:44:06Z

rllib/env/single_agent_episode.py


        # Validate.
        self.validate()

-    def add_initial_observation(
+    def add_env_reset(


Changed name for clarity:

add_env_reset(): Add all the data returned by an env.reset call

add_env_step(): Add all the data returned by an env.step call

sven1977 · 2023-11-14T11:44:13Z

rllib/env/single_agent_episode.py

        # TODO (sven): Do we have to call validate here? It is our own function
        # that manipulates the object.
        self.validate()

-    def add_timestep(
+    def add_env_step(


sven1977 · 2023-11-14T11:44:24Z

rllib/env/single_agent_episode.py

-                    self.extra_model_outputs[k] = [v]
-                else:
-                    self.extra_model_outputs[k].append(v)
+                self.extra_model_outputs[k].append(v)


simplified via defaultdict

sven1977 · 2023-11-14T11:44:46Z

rllib/env/single_agent_episode.py

        """

-        self.observations = np.array(self.observations)
-        self.actions = np.array(self.actions)
+        self.observations = batch(self.observations)


Allow for nested obs/action spaces.

sven1977 · 2023-11-14T11:45:13Z

rllib/env/single_agent_episode.py

        self.render_images = np.array(self.render_images, dtype=np.uint8)
        for k, v in self.extra_model_outputs.items():
-            self.extra_model_outputs[k] = np.array(v)
+            self.extra_model_outputs[k] = batch(v)


Allow for complex (nested) model outs (especially now that states are part of these extra model outs).

can we use batch and not np.array conversion everywhere? This allows us to unittest batch and make sure it's behavior is predictable and re-used that everywhere.

The argument against it is that this would be overkill (we know that rewards are only a lits of floats, never complex structs). But yes, batch() should work on these as well, of course. There is a proper unit test for batch, which was added recently.

Solved: I added an extra test for batch/unbatch on simple structs AND used batch() everywhere in this method (even on rewards).

sven1977 · 2023-11-14T11:47:06Z

rllib/env/single_agent_episode.py

-                for k in extra_model_output_keys
-            },
-        )
+    def get_observations(self, indices: Optional[Union[int, List[int], slice]] = None) -> Any:


Added these very practical new APIs to get data from the episode in a user friendly fashion.

sven1977 · 2023-11-14T11:47:55Z

rllib/env/single_agent_episode.py

        )

-    @staticmethod


We'll try to get rid of SampleBatch eventually (it's kind of an overloaded mess). There is no application currently that requires constructing an episode from an existing SampleBatch (only the other way around: Episode -> SampleBatch)

sven1977 · 2023-11-17T08:32:56Z

rllib/env/single_agent_env_runner.py

-        gym.register(
-            "custom-env-v0",
-            partial(
+        if (


This is a bug fix. Otherwise, passing in a class to config.environment(env=[some class]) does not work (only strings work).

…e_fixes Signed-off-by: Sven Mika <svenmika1977@gmail.com>

sven1977 · 2023-11-17T08:36:47Z

rllib/env/single_agent_env_runner.py

-                    # TODO (simon): Check, if this works for the default
-                    # stateful encoders.
-                    initial_state={k: s[i] for k, s in states.items()},
+                self._episodes[i].add_env_reset(


Cleaner naming of these Episode methods:

add_env_reset

add_env_step

Both add to an episode the return values of those gym.Env calls.

sven1977 · 2023-11-17T08:47:05Z

rllib/env/single_agent_env_runner.py

@@ -95,6 +96,11 @@ def __init__(self, config: "AlgorithmConfig", **kwargs):
        self._ts_since_last_metrics: int = 0
        self._weights_seq_no: int = 0

+        # TODO (sven): This is a temporary solution. STATE_OUTs


Temp. fix: We need the new connectors to make this work w/o having to keep self._states around here. The PRs for this are lined up and rely on this one here to be merged first.

sven1977 · 2023-11-17T08:54:18Z

rllib/env/tests/test_single_agent_episode.py

@@ -71,8 +71,7 @@ def test_init(self):
        rewards = []
        actions = []
        infos = []
-        extra_model_outputs = []
-        states = np.random.random(10)
+        extra_model_outputs = {"extra_1": [], "state_out": np.random.random()}


Fixed the tests to move state_out into being just another extra_model_out.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_sa_episode_fixes' into env_runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

kouroshHakha · 2023-11-28T21:58:19Z

the rllib tests are still failing.

…runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_02_sa_episode_fixes

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9ac9a41

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and kouroshHakha as code owners November 10, 2023 14:38

sven1977 assigned kouroshHakha Nov 10, 2023

sven1977 commented Nov 14, 2023

View reviewed changes

sven1977 changed the title ~~[RLlib] Preparatory PR: Make EnvRunners use (enhanced) Connector API (#02: SingleAgentEpisode enhancements)~~ [RLlib] New ConnectorV2 API #02: SingleAgentEpisode enhancements. (#41074) Nov 17, 2023

sven1977 commented Nov 17, 2023

View reviewed changes

Merge branch 'master' into env_runner_support_connectors_02_sa_episod…

fc6670d

…e_fixes Signed-off-by: Sven Mika <svenmika1977@gmail.com>

sven1977 commented Nov 17, 2023

View reviewed changes

sven1977 added 7 commits November 17, 2023 12:31

wip

8d85c54

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/env_runner_support_connectors_02…

b77350d

…_sa_episode_fixes' into env_runner_support_connectors_02_sa_episode_fixes

LINT

7a2e6a7

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

89d5847

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

21c084b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6fa977f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a31fb51

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added 19 commits November 21, 2023 21:04

wip

900bcfc

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

397c2a6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

46779d9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6d609da

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

5a68444

…runner_support_connectors_02_sa_episode_fixes

wip

12ab11c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b3f222c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a91d399

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

7c3df95

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

02aeece

…runner_support_connectors_02_sa_episode_fixes

wip

432aff3

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

4575e00

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

84a9495

…runner_support_connectors_02_sa_episode_fixes

wip

dcf38a7

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

4cd7d81

…runner_support_connectors_02_sa_episode_fixes

wip

2645c87

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1c9b331

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

a74eaec

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

486569d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

kouroshHakha approved these changes Nov 28, 2023

View reviewed changes

sven1977 added 8 commits November 29, 2023 09:37

Merge branch 'master' of https://github.com/ray-project/ray into env_…

05752c9

…runner_support_connectors_02_sa_episode_fixes

wip

64be6fd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

648f30f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

9b2d5ad

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

22e5181

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c3d8ea5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

07ec6b1

…runner_support_connectors_02_sa_episode_fixes

learning CartPole w/ EnvRunner

a8e9bff

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit d6d2dee into ray-project:master Nov 30, 2023
8 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] New ConnectorV2 API #02: SingleAgentEpisode enhancements. #41075

[RLlib] New ConnectorV2 API #02: SingleAgentEpisode enhancements. #41075

sven1977 commented Nov 10, 2023 •

edited

Loading

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023 •

edited

Loading

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023

kouroshHakha Nov 20, 2023

sven1977 Nov 20, 2023

sven1977 Nov 21, 2023

sven1977 Nov 14, 2023

sven1977 Nov 14, 2023 •

edited

Loading

sven1977 Nov 17, 2023

sven1977 Nov 17, 2023

sven1977 Nov 17, 2023

sven1977 Nov 17, 2023

kouroshHakha commented Nov 28, 2023

[RLlib] New ConnectorV2 API #02: SingleAgentEpisode enhancements. #41075

[RLlib] New ConnectorV2 API #02: SingleAgentEpisode enhancements. #41075

Conversation

sven1977 commented Nov 10, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kouroshHakha commented Nov 28, 2023

sven1977 commented Nov 10, 2023 •

edited

Loading

sven1977 Nov 14, 2023 •

edited

Loading

sven1977 Nov 14, 2023 •

edited

Loading