Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Feb 28, 2024

ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…earning in SA and MA mode.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
# srcs = ["examples/connectors/connector_v2_frame_stacking.py"],
# args = ["--enable-new-api-stack", "--num-agents=2", "--stop-iter=2", "--framework=torch", "--algo=PPO"]
# )
py_test(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@@ -2846,17 +2843,17 @@ py_test(
tags = ["team:rllib", "exclusive", "examples"],
size = "large",
srcs = ["examples/connectors/connector_v2_prev_actions_prev_rewards.py"],
args = ["--enable-new-api-stack", "--as-test", "--stop-reward=150.0", "--framework=torch", "--algo=PPO", "--num-env-runners=4", "--num-cpus=6"]
args = ["--enable-new-api-stack", "--as-test", "--stop-reward=200.0", "--framework=torch", "--algo=PPO", "--num-env-runners=4", "--num-cpus=6"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@@ -344,6 +344,8 @@ def __init__(self, algo_class=None):
self.enable_connectors = True
self._env_to_module_connector = None
self._module_to_env_connector = None
self.add_default_connectors_to_env_to_module_pipeline = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 new config options to completely disable default connector pieces to be added by RLlib. This is for advanced users who know exactly what they are doing (and why they don't need these pieces).

else:
return val_
raise ValueError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More clear behavior now: Return connector piece or pipeline or list of pieces (to be put into a pipeline).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a big fan of comments :D

batch[module_id] = {}

# Remove all zero-padding again, if applicable for the upcoming
episode_lens_plus_1 = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified

if SampleBatch.OBS in data:
return data

for sa_episode in self.single_agent_episode_iterator(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying (quite successfully) very hard to only use these new ConnectorV2 APIs/utilities from here on:

  • single_agent_episode_iterator
  • add_batch_item
  • add_n_batch_items
  • foreach_batch_item_change_in_place

@@ -0,0 +1,286 @@
import math
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result of disseminating all default connector pieces (env-to-module, module-to-env, learner). Some of these individual pieces are now shared (e.g. both env-to-module AND learner will use the AddObservationsFromEpisodesToBatch).

@@ -13,7 +13,7 @@


@PublicAPI(stability="alpha")
class _FrameStackingConnector(ConnectorV2):
class _FrameStacking(ConnectorV2):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed naming convention from ...Connector to just ... for simplicity and shorter (but more descriptive) names.

shared_data=shared_data,
**kwargs,
)
data = connector(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified, we have NOT yet used these timer stats at all in our results dicts, so thus far these were a waste of time.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Most parts are requests for documentation :)

@@ -344,6 +344,8 @@ def __init__(self, algo_class=None):
self.enable_connectors = True
self._env_to_module_connector = None
self._module_to_env_connector = None
self.add_default_connectors_to_env_to_module_pipeline = True
self.add_default_connectors_to_module_to_env_pipeline = True
self.episode_lookback_horizon = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why len_lookback_buffer=1 and not 0? Such that we always get a tuple (o_t, a_t, r_t+s, o_t+1) when calling get_<attr>() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think I decided this for "good measure", so at least one can look at the previous reward and action (which would not be possible if lookback buffer=0). In the end, the user will have to configure this value based on their connector "lookback" requirements. E.g. if they always have to access the last 5 observations, they set it to 5.

@@ -851,8 +854,12 @@ class directly. Note that this arg can also be specified via

def build_env_to_module_connector(self, env):
from ray.rllib.connectors.env_to_module import (
AddObservationsFromEpisodeToBatch,
AddStatesFromEpisodesToBatch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the obs when we add the state?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, the state from the module. In this regard, we want to take a closer look at the new PR for extra_model_outputs in PrioritizedEpisodeReplayBuffer: this data is added as a dict. WOuld either need a connector to extract the state` then or we extract inside of the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, OBS vs (internal) e.g. RNN/LSTM state of the model, which can be found under extra_model_outputs in the episodes.

else:
return val_
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a big fan of comments :D


# Append: Anything that has to do with action sampling.
# Unsquash/clip actions based on config and action space.
pipeline.append(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, if here should stand a connector that deals with complex action spaces or do we handle this in the configuration elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one does! It only normalizes/clips those components (in a possibly complex action space) that are a Box. All other components, e.g. Discrete, MultiDiscrete, are left as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the nested_action_spaces.py example script, which uses this connector (by default) as well.

pipeline.append(AddColumnsToTrainBatch())
# Append STATE_IN/STATE_OUT (and time-rank) handler.
pipeline.append(
AddStatesFromEpisodesToBatch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might write a default connector taking keys and extracting these from the infos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! We should have a off-the-shelf one for this purpose in rllib/connectors/env_to_module. It could complexify the observation space (make it always a dict with these keys from infos plus the "_obs" key for the original observations). Then one could use this connector (maybe together with a Flatten connector, which already exists).

@@ -525,19 +525,30 @@ def __init__(self, config: EnvContext = None):
self._action_space_in_preferred_format = True
self._agent_ids = set(range(num))

# TEST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this, if this was only for debugging or comment it otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks! Yes, this is debugging code :(

@override(MultiAgentEnv)
def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None):
self.terminateds = set()
self.truncateds = set()
obs, infos = {}, {}
for i, env in enumerate(self.envs):
obs[i], infos[i] = env.reset(seed=seed, options=options)

# TEST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :|

return obs, infos

@override(MultiAgentEnv)
def step(self, action_dict):
obs, rew, terminated, truncated, info = {}, {}, {}, {}, {}

# TEST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well ;)

@@ -1685,13 +1696,13 @@ def _get_data_by_env_steps_as_list(
# the next step.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a doc string for the arguments?

@@ -2,8 +2,8 @@
import os

from ray.tune.registry import register_env
from ray.rllib.connectors.common import AddObservationsFromEpisodeToBatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this example also to examples/connectors?

…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
…red in episode (in episode, we should only store those actions actually coming from the dist sample process!).

BUT nested_actions example (single-agent) seems to be >10% slower now than on master.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…k, single-agent is not ??

* code simplifications
* better nested action space test via adding lopsided action bounds

Signed-off-by: sven1977 <svenmika1977@gmail.com>
…dn't matter for MA, b/c MA is NOT vectorized, only a sub-env, sorting doesn't effect this).

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 457d6e9 into ray-project:master Mar 3, 2024
8 of 9 checks passed
hebiao064 pushed a commit to hebiao064/ray that referenced this pull request Mar 12, 2024
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants