-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491
[RLlib] ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA #43491
Conversation
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…earning in SA and MA mode. Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
# srcs = ["examples/connectors/connector_v2_frame_stacking.py"], | ||
# args = ["--enable-new-api-stack", "--num-agents=2", "--stop-iter=2", "--framework=torch", "--algo=PPO"] | ||
# ) | ||
py_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
@@ -2846,17 +2843,17 @@ py_test( | |||
tags = ["team:rllib", "exclusive", "examples"], | |||
size = "large", | |||
srcs = ["examples/connectors/connector_v2_prev_actions_prev_rewards.py"], | |||
args = ["--enable-new-api-stack", "--as-test", "--stop-reward=150.0", "--framework=torch", "--algo=PPO", "--num-env-runners=4", "--num-cpus=6"] | |||
args = ["--enable-new-api-stack", "--as-test", "--stop-reward=200.0", "--framework=torch", "--algo=PPO", "--num-env-runners=4", "--num-cpus=6"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
@@ -344,6 +344,8 @@ def __init__(self, algo_class=None): | |||
self.enable_connectors = True | |||
self._env_to_module_connector = None | |||
self._module_to_env_connector = None | |||
self.add_default_connectors_to_env_to_module_pipeline = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 new config options to completely disable default connector pieces to be added by RLlib. This is for advanced users who know exactly what they are doing (and why they don't need these pieces).
else: | ||
return val_ | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More clear behavior now: Return connector piece or pipeline or list of pieces (to be put into a pipeline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a big fan of comments :D
batch[module_id] = {} | ||
|
||
# Remove all zero-padding again, if applicable for the upcoming | ||
episode_lens_plus_1 = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified
if SampleBatch.OBS in data: | ||
return data | ||
|
||
for sa_episode in self.single_agent_episode_iterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying (quite successfully) very hard to only use these new ConnectorV2 APIs/utilities from here on:
- single_agent_episode_iterator
- add_batch_item
- add_n_batch_items
- foreach_batch_item_change_in_place
@@ -0,0 +1,286 @@ | |||
import math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result of disseminating all default connector pieces (env-to-module, module-to-env, learner). Some of these individual pieces are now shared (e.g. both env-to-module AND learner will use the AddObservationsFromEpisodesToBatch
).
@@ -13,7 +13,7 @@ | |||
|
|||
|
|||
@PublicAPI(stability="alpha") | |||
class _FrameStackingConnector(ConnectorV2): | |||
class _FrameStacking(ConnectorV2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed naming convention from ...Connector
to just ...
for simplicity and shorter (but more descriptive) names.
shared_data=shared_data, | ||
**kwargs, | ||
) | ||
data = connector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified, we have NOT yet used these timer stats at all in our results dicts, so thus far these were a waste of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Most parts are requests for documentation :)
@@ -344,6 +344,8 @@ def __init__(self, algo_class=None): | |||
self.enable_connectors = True | |||
self._env_to_module_connector = None | |||
self._module_to_env_connector = None | |||
self.add_default_connectors_to_env_to_module_pipeline = True | |||
self.add_default_connectors_to_module_to_env_pipeline = True | |||
self.episode_lookback_horizon = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why len_lookback_buffer=1
and not 0
? Such that we always get a tuple (o_t, a_t, r_t+s, o_t+1)
when calling get_<attr>()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think I decided this for "good measure", so at least one can look at the previous reward and action (which would not be possible if lookback buffer=0). In the end, the user will have to configure this value based on their connector "lookback" requirements. E.g. if they always have to access the last 5 observations, they set it to 5.
@@ -851,8 +854,12 @@ class directly. Note that this arg can also be specified via | |||
|
|||
def build_env_to_module_connector(self, env): | |||
from ray.rllib.connectors.env_to_module import ( | |||
AddObservationsFromEpisodeToBatch, | |||
AddStatesFromEpisodesToBatch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the obs when we add the state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, the state from the module. In this regard, we want to take a closer look at the new PR for extra_model_outputs
in PrioritizedEpisodeReplayBuffer
: this data is added as a dict. WOuld either need a connector to extract the
state` then or we extract inside of the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, OBS vs (internal) e.g. RNN/LSTM state of the model, which can be found under extra_model_outputs in the episodes.
else: | ||
return val_ | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a big fan of comments :D
|
||
# Append: Anything that has to do with action sampling. | ||
# Unsquash/clip actions based on config and action space. | ||
pipeline.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering, if here should stand a connector that deals with complex action spaces or do we handle this in the configuration elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one does! It only normalizes/clips those components (in a possibly complex action space) that are a Box. All other components, e.g. Discrete, MultiDiscrete, are left as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the nested_action_spaces.py example script, which uses this connector (by default) as well.
pipeline.append(AddColumnsToTrainBatch()) | ||
# Append STATE_IN/STATE_OUT (and time-rank) handler. | ||
pipeline.append( | ||
AddStatesFromEpisodesToBatch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might write a default connector taking keys and extracting these from the infos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! We should have a off-the-shelf one for this purpose in rllib/connectors/env_to_module
. It could complexify the observation space (make it always a dict with these keys from infos plus the "_obs" key for the original observations). Then one could use this connector (maybe together with a Flatten connector, which already exists).
rllib/env/multi_agent_env.py
Outdated
@@ -525,19 +525,30 @@ def __init__(self, config: EnvContext = None): | |||
self._action_space_in_preferred_format = True | |||
self._agent_ids = set(range(num)) | |||
|
|||
# TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this, if this was only for debugging or comment it otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks! Yes, this is debugging code :(
rllib/env/multi_agent_env.py
Outdated
@override(MultiAgentEnv) | ||
def reset(self, *, seed: Optional[int] = None, options: Optional[dict] = None): | ||
self.terminateds = set() | ||
self.truncateds = set() | ||
obs, infos = {}, {} | ||
for i, env in enumerate(self.envs): | ||
obs[i], infos[i] = env.reset(seed=seed, options=options) | ||
|
||
# TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed :|
rllib/env/multi_agent_env.py
Outdated
return obs, infos | ||
|
||
@override(MultiAgentEnv) | ||
def step(self, action_dict): | ||
obs, rew, terminated, truncated, info = {}, {}, {}, {}, {} | ||
|
||
# TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well ;)
@@ -1685,13 +1696,13 @@ def _get_data_by_env_steps_as_list( | |||
# the next step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a doc string for the arguments?
@@ -2,8 +2,8 @@ | |||
import os | |||
|
|||
from ray.tune.registry import register_env | |||
from ray.rllib.connectors.common import AddObservationsFromEpisodeToBatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this example also to examples/connectors
?
…ector_v2_fix_framestacking_for_multi_agent
…ector_v2_fix_framestacking_for_multi_agent
…red in episode (in episode, we should only store those actions actually coming from the dist sample process!). BUT nested_actions example (single-agent) seems to be >10% slower now than on master. Signed-off-by: sven1977 <svenmika1977@gmail.com>
…k, single-agent is not ?? * code simplifications * better nested action space test via adding lopsided action bounds Signed-off-by: sven1977 <svenmika1977@gmail.com>
…dn't matter for MA, b/c MA is NOT vectorized, only a sub-env, sorting doesn't effect this). Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…ector_v2_fix_framestacking_for_multi_agent
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…tacking for MA, fix bugs in prev-a/prev-r for MA. (ray-project#43491)
…tacking for MA, fix bugs in prev-a/prev-r for MA. (ray-project#43491)
ConnectorV2: Disseminate default connector pieces, fix framestacking for MA, fix bugs in prev-a/prev-r for MA
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.