[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491

sven1977 · 2024-02-28T11:26:54Z

ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_fix_framestacking_for_multi_agent

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_fix_framestacking_for_multi_agent

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…earning in SA and MA mode. Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_fix_framestacking_for_multi_agent

sven1977 · 2024-02-28T13:28:30Z

rllib/BUILD

-#    srcs = ["examples/connectors/connector_v2_frame_stacking.py"],
-#    args = ["--enable-new-api-stack", "--num-agents=2", "--stop-iter=2", "--framework=torch", "--algo=PPO"]
-# )
+py_test(


sven1977 · 2024-02-28T13:28:36Z

rllib/BUILD

@@ -2846,17 +2843,17 @@ py_test(
    tags = ["team:rllib", "exclusive", "examples"],
    size = "large",
    srcs = ["examples/connectors/connector_v2_prev_actions_prev_rewards.py"],
-    args = ["--enable-new-api-stack", "--as-test", "--stop-reward=150.0", "--framework=torch", "--algo=PPO", "--num-env-runners=4", "--num-cpus=6"]
+    args = ["--enable-new-api-stack", "--as-test", "--stop-reward=200.0", "--framework=torch", "--algo=PPO", "--num-env-runners=4", "--num-cpus=6"]


sven1977 · 2024-02-28T13:29:17Z

rllib/algorithms/algorithm_config.py

@@ -344,6 +344,8 @@ def __init__(self, algo_class=None):
        self.enable_connectors = True
        self._env_to_module_connector = None
        self._module_to_env_connector = None
+        self.add_default_connectors_to_env_to_module_pipeline = True


3 new config options to completely disable default connector pieces to be added by RLlib. This is for advanced users who know exactly what they are doing (and why they don't need these pieces).

sven1977 · 2024-02-28T13:29:45Z

rllib/algorithms/algorithm_config.py

            else:
-                return val_
+                raise ValueError(


More clear behavior now: Return connector piece or pipeline or list of pieces (to be put into a pipeline).

I am a big fan of comments :D

sven1977 · 2024-02-28T13:30:43Z

rllib/algorithms/ppo/ppo_learner.py

-            batch[module_id] = {}
-
-            # Remove all zero-padding again, if applicable for the upcoming
+            episode_lens_plus_1 = [


sven1977 · 2024-02-28T13:32:57Z

rllib/connectors/common/add_observations_from_episodes_to_batch.py

+        if SampleBatch.OBS in data:
+            return data
+
+        for sa_episode in self.single_agent_episode_iterator(


Trying (quite successfully) very hard to only use these new ConnectorV2 APIs/utilities from here on:

single_agent_episode_iterator

add_batch_item

add_n_batch_items

foreach_batch_item_change_in_place

sven1977 · 2024-02-28T13:33:43Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

@@ -0,0 +1,286 @@
+import math


Result of disseminating all default connector pieces (env-to-module, module-to-env, learner). Some of these individual pieces are now shared (e.g. both env-to-module AND learner will use the AddObservationsFromEpisodesToBatch).

sven1977 · 2024-02-28T13:41:34Z

rllib/connectors/common/frame_stacking.py

@@ -13,7 +13,7 @@


 @PublicAPI(stability="alpha")
-class _FrameStackingConnector(ConnectorV2):
+class _FrameStacking(ConnectorV2):


Changed naming convention from ...Connector to just ... for simplicity and shorter (but more descriptive) names.

sven1977 · 2024-02-28T13:42:10Z

rllib/connectors/connector_pipeline_v2.py

-                    shared_data=shared_data,
-                    **kwargs,
-                )
+            data = connector(


Simplified, we have NOT yet used these timer stats at all in our results dicts, so thus far these were a waste of time.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Most parts are requests for documentation :)

simonsays1980 · 2024-02-28T13:51:09Z

rllib/algorithms/algorithm_config.py

@@ -344,6 +344,8 @@ def __init__(self, algo_class=None):
        self.enable_connectors = True
        self._env_to_module_connector = None
        self._module_to_env_connector = None
+        self.add_default_connectors_to_env_to_module_pipeline = True
+        self.add_default_connectors_to_module_to_env_pipeline = True
        self.episode_lookback_horizon = 1


Why len_lookback_buffer=1 and not 0? Such that we always get a tuple (o_t, a_t, r_t+s, o_t+1) when calling get_<attr>() ?

Good question. I think I decided this for "good measure", so at least one can look at the previous reward and action (which would not be possible if lookback buffer=0). In the end, the user will have to configure this value based on their connector "lookback" requirements. E.g. if they always have to access the last 5 observations, they set it to 5.

simonsays1980 · 2024-02-28T13:51:55Z

rllib/algorithms/algorithm_config.py

@@ -851,8 +854,12 @@ class directly. Note that this arg can also be specified via

    def build_env_to_module_connector(self, env):
        from ray.rllib.connectors.env_to_module import (
+            AddObservationsFromEpisodeToBatch,
+            AddStatesFromEpisodesToBatch,


Why the obs when we add the state?

Ah got it, the state from the module. In this regard, we want to take a closer look at the new PR for extra_model_outputs in PrioritizedEpisodeReplayBuffer: this data is added as a dict. WOuld either need a connector to extract the state` then or we extract inside of the buffer.

Yeah, OBS vs (internal) e.g. RNN/LSTM state of the model, which can be found under extra_model_outputs in the episodes.

simonsays1980 · 2024-02-28T13:53:01Z

rllib/algorithms/algorithm_config.py

            else:
-                return val_
+                raise ValueError(


I am a big fan of comments :D

simonsays1980 · 2024-02-28T13:58:11Z

rllib/algorithms/algorithm_config.py

+
+            # Append: Anything that has to do with action sampling.
+            # Unsquash/clip actions based on config and action space.
+            pipeline.append(


I am wondering, if here should stand a connector that deals with complex action spaces or do we handle this in the configuration elsewhere?

This one does! It only normalizes/clips those components (in a possibly complex action space) that are a Box. All other components, e.g. Discrete, MultiDiscrete, are left as-is.

See the nested_action_spaces.py example script, which uses this connector (by default) as well.

simonsays1980 · 2024-02-28T14:02:07Z

rllib/algorithms/algorithm_config.py

+            pipeline.append(AddColumnsToTrainBatch())
+            # Append STATE_IN/STATE_OUT (and time-rank) handler.
+            pipeline.append(
+                AddStatesFromEpisodesToBatch(


I might write a default connector taking keys and extracting these from the infos.

Great idea! We should have a off-the-shelf one for this purpose in rllib/connectors/env_to_module. It could complexify the observation space (make it always a dict with these keys from infos plus the "_obs" key for the original observations). Then one could use this connector (maybe together with a Flatten connector, which already exists).

simonsays1980 · 2024-02-28T18:59:12Z

rllib/env/multi_agent_env.py

@@ -525,19 +525,30 @@ def __init__(self, config: EnvContext = None):
            self._action_space_in_preferred_format = True
            self._agent_ids = set(range(num))

+            # TEST


Can we remove this, if this was only for debugging or comment it otherwise?

Good catch. Thanks! Yes, this is debugging code :(

simonsays1980 · 2024-02-28T18:59:25Z

rllib/env/multi_agent_env.py

        @override(MultiAgentEnv)
        def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
            self.terminateds = set()
            self.truncateds = set()
            obs, infos = {}, {}
            for i, env in enumerate(self.envs):
                obs[i], infos[i] = env.reset(seed=seed, options=options)
+
+            # TEST


simonsays1980 · 2024-02-28T18:59:39Z

rllib/env/multi_agent_env.py

            return obs, infos

        @override(MultiAgentEnv)
        def step(self, action_dict):
            obs, rew, terminated, truncated, info = {}, {}, {}, {}, {}

+            # TEST


Here as well ;)

simonsays1980 · 2024-02-28T19:08:46Z

rllib/env/multi_agent_episode.py

@@ -1685,13 +1696,13 @@ def _get_data_by_env_steps_as_list(
        # the next step.


Can we add a doc string for the arguments?

simonsays1980 · 2024-02-28T19:14:13Z

rllib/examples/nested_action_spaces.py

@@ -2,8 +2,8 @@
 import os

 from ray.tune.registry import register_env
+from ray.rllib.connectors.common import AddObservationsFromEpisodeToBatch


Can we move this example also to examples/connectors?

…ector_v2_fix_framestacking_for_multi_agent

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_fix_framestacking_for_multi_agent

…red in episode (in episode, we should only store those actions actually coming from the dist sample process!). BUT nested_actions example (single-agent) seems to be >10% slower now than on master. Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…k, single-agent is not ?? * code simplifications * better nested action space test via adding lopsided action bounds Signed-off-by: sven1977 <svenmika1977@gmail.com>

…dn't matter for MA, b/c MA is NOT vectorized, only a sub-env, sorting doesn't effect this). Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ector_v2_fix_framestacking_for_multi_agent

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…tacking for MA, fix bugs in prev-a/prev-r for MA. (ray-project#43491)

sven1977 added 10 commits February 23, 2024 15:34

wip

dcf3da9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

65e6efc

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c0f43fb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

single-agent framestacking example working

2efe039

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

6628cce

…ector_v2_fix_framestacking_for_multi_agent

multi-agent AND single-agent framestacking example working

a1559be

Signed-off-by: sven1977 <svenmika1977@gmail.com>

multi-agent AND single-agent nested dict example working

3343d6c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

4e358c3

…ector_v2_fix_framestacking_for_multi_agent

strange state: multi-agent nested obs learning, but single-agent NOT!

e96188c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

all example scripts running in sa and ma mode, but sa not learning(???)

2936879

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla and kouroshHakha as code owners February 28, 2024 11:26

sven1977 assigned simonsays1980 Feb 28, 2024

sven1977 requested a review from simonsays1980 February 28, 2024 11:27

sven1977 added 3 commits February 28, 2024 13:45

all example scripts (framestack, nested obs, prev-a/prev-r) running/l…

096e933

…earning in SA and MA mode. Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

aa2d1cb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

833d4ec

…ector_v2_fix_framestacking_for_multi_agent

sven1977 commented Feb 28, 2024

View reviewed changes

sven1977 added 2 commits February 28, 2024 14:43

wip

5768abe

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b111365

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Feb 28, 2024

View reviewed changes

sven1977 added 25 commits March 1, 2024 10:03

Merge branch 'master' of https://github.com/ray-project/ray into conn…

9c49e31

…ector_v2_fix_framestacking_for_multi_agent

wip

a0f6b1e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e189dc5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

1dafdcc

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

668dbde

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

9785db3

Signed-off-by: sven1977 <svenmika1977@gmail.com>

merge

c832bce

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

a12ad55

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

0a4ef8d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

d1b2dd3

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

5883e3f

…ector_v2_fix_framestacking_for_multi_agent

LINT

9908bc1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

noticed that multi-agent mean std filtering w/ pendulum is actually o…

ef26e62

…k, single-agent is not ?? * code simplifications * better nested action space test via adding lopsided action bounds Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixed simple agent problem. Still a sorted in listify connector (di…

91c0827

…dn't matter for MA, b/c MA is NOT vectorized, only a sub-env, sorting doesn't effect this). Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

c012f11

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fixes

360dbc2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

docstrings

16051d5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into conn…

de74ab4

…ector_v2_fix_framestacking_for_multi_agent

docstrings

2498d1e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

docstrings

d0040bd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

docstrings

6a8b117

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

f4c29d0

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

d80a0a5

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

ca07d17

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit 457d6e9 into ray-project:master Mar 3, 2024
8 of 9 checks passed

sven1977 mentioned this pull request Mar 4, 2024

Revert "[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA" #43667

Closed

hebiao064 pushed a commit to hebiao064/ray that referenced this pull request Mar 12, 2024

[RLlib] ConnectorV2: Disseminate default connector pieces, fix frames…

f397ad7

…tacking for MA, fix bugs in prev-a/prev-r for MA. (ray-project#43491)

ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024

[RLlib] ConnectorV2: Disseminate default connector pieces, fix frames…

ec6e05a

…tacking for MA, fix bugs in prev-a/prev-r for MA. (ray-project#43491)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491

[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491

sven1977 commented Feb 28, 2024 •

edited

Loading

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

simonsays1980 Feb 28, 2024

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

sven1977 Feb 28, 2024

simonsays1980 left a comment

simonsays1980 Feb 28, 2024

sven1977 Mar 1, 2024

simonsays1980 Feb 28, 2024

simonsays1980 Feb 28, 2024

sven1977 Mar 2, 2024

simonsays1980 Feb 28, 2024

simonsays1980 Feb 28, 2024

sven1977 Mar 2, 2024

sven1977 Mar 2, 2024

simonsays1980 Feb 28, 2024

sven1977 Mar 2, 2024

simonsays1980 Feb 28, 2024

sven1977 Mar 1, 2024

simonsays1980 Feb 28, 2024

sven1977 Mar 1, 2024

simonsays1980 Feb 28, 2024

simonsays1980 Feb 28, 2024

simonsays1980 Feb 28, 2024

		@@ -1685,13 +1696,13 @@ def _get_data_by_env_steps_as_list(
		# the next step.

[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491

[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491

Conversation

sven1977 commented Feb 28, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Feb 28, 2024 •

edited

Loading