-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] ConnectorV2: Fix bug in AddColumnsFromEpisodesToTrainBatch
class (multi-agent related).
#43680
[RLlib] ConnectorV2: Fix bug in AddColumnsFromEpisodesToTrainBatch
class (multi-agent related).
#43680
Conversation
…atch_connector' into fix_bug_in_add_column_to_train_batch_connector
…ctor Signed-off-by: Sven Mika <svenmika1977@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The problem with custom columns sometimes overridden should be fixed with this.
agents_that_stepped_only=False, | ||
): | ||
for column in sa_episode.extra_model_outputs.keys(): | ||
if column not in skip_columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! That should leave all custom columns untouched now.
@@ -1706,12 +1706,3 @@ def _set_optimizer_lr(optimizer: Optimizer, lr: float) -> None: | |||
@abc.abstractmethod | |||
def _get_clip_function() -> Callable: | |||
"""Returns the gradient clipping function to use, given the framework.""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yes, hps
has now gone to the dark side :)
ConnectorV2: Fix bug in
AddColumnsFromEpisodesToTrainBatch
class (multi-agent related).The bug would appear only in multi-agent AND if one RLModule does NOT produce action_dist_inputs (b/c e.g. it's a random one). It was caused by using a (random) single agent episode as a reference episode to figure out, which extra_model_output columns to loop through. If this reference single agent episode was from the random RLModule, the column
action_dist_inputs
(and others) would be missing.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.