-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759
[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, some suggestions for your consideration.
thanks.
rllib/policy/dynamic_tf_policy.py
Outdated
@@ -88,57 +72,53 @@ def __init__( | |||
obs_include_prev_action_reward=DEPRECATED_VALUE): | |||
"""Initializes a DynamicTFPolicy instance. | |||
|
|||
Initialization oxf this class occurs in two phases and defines the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo /of/oxf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rllib/policy/dynamic_tf_policy.py
Outdated
action_space: Action space of the policy. | ||
config: Policy-specific configuration data. | ||
loss_fn: Function that returns a loss tensor for the policy graph. | ||
stats_fn: Optional function that returns a dict of TF fetches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "TF fetches" mean? I thought we can export any numeric stats here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for TFPolicy. The stats fn here has to return static graph tensors, just like e.g. the loss function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, that's right.
rllib/policy/dynamic_tf_policy.py
Outdated
loss_fn: Function that returns a loss tensor for the policy graph. | ||
stats_fn: Optional function that returns a dict of TF fetches | ||
given the policy and batch input tensors. | ||
grad_stats_fn: Optional function that returns a dict of TF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we explain the difference between stats_fn and grad_stats _fn please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
generate an action distribution object from, and | ||
internal-state outputs (or an empty list if not applicable). | ||
action_distribution_fn: A callable returning distribution inputs | ||
(parameters), a dist-class to generate an action distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, personally if I am a new user, I would be quite confused about the relationship between action_sampler_fn and action_distribution_fn.
I think I understand action sampling fn, but what does action distribution fn do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very valid point! I'll try to clarify and also mention that only one of these should be defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is much better.
rllib/policy/tf_policy.py
Outdated
seq_lens (Optional[TensorType]): Placeholder for RNN sequence | ||
lengths, of shape [NUM_SEQUENCES]. | ||
model: The optional ModelV2 to use for calculating actions and | ||
losses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we mention what model will get used if not specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
"""Returns whether the loss term(s) have been initialized.""" | ||
return len(self._losses) > 0 | ||
|
||
def _initialize_loss(self, losses: List[TensorType], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no functional changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, just moving stuff around to sort methods better by importance. Private/deprecated/helpers move more to the end.
Hey @gjoliver thanks for the review. Addressed all requests, could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great!! thanks.
generate an action distribution object from, and | ||
internal-state outputs (or an empty list if not applicable). | ||
action_distribution_fn: A callable returning distribution inputs | ||
(parameters), a dist-class to generate an action distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is much better.
Docstring cleanup: Policies, policy_templates.
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.