[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759

sven1977 · 2021-10-26T20:02:28Z

Docstring cleanup: Policies, policy_templates.

Cleanup docstrings.
Remove type hints in Args list (already in the signature).
Remve type hint from "Returns" (already in signature).
Add docstrings where missing.
Re-sort some class methods (by their importance, private/public, deprecated, etc..).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gjoliver

similarly, some suggestions for your consideration.
thanks.

gjoliver · 2021-10-27T05:56:26Z

rllib/policy/dynamic_tf_policy.py

@@ -88,57 +72,53 @@ def __init__(
            obs_include_prev_action_reward=DEPRECATED_VALUE):
        """Initializes a DynamicTFPolicy instance.

+        Initialization oxf this class occurs in two phases and defines the


typo /of/oxf

gjoliver · 2021-10-27T05:59:14Z

rllib/policy/dynamic_tf_policy.py

+            action_space: Action space of the policy.
+            config: Policy-specific configuration data.
+            loss_fn: Function that returns a loss tensor for the policy graph.
+            stats_fn: Optional function that returns a dict of TF fetches


what does "TF fetches" mean? I thought we can export any numeric stats here?

Not for TFPolicy. The stats fn here has to return static graph tensors, just like e.g. the loss function.

added explanation.

interesting, that's right.

gjoliver · 2021-10-27T06:00:26Z

rllib/policy/dynamic_tf_policy.py

+            loss_fn: Function that returns a loss tensor for the policy graph.
+            stats_fn: Optional function that returns a dict of TF fetches
+                given the policy and batch input tensors.
+            grad_stats_fn: Optional function that returns a dict of TF


can we explain the difference between stats_fn and grad_stats _fn please?

gjoliver · 2021-10-27T06:03:59Z

rllib/policy/dynamic_tf_policy.py

-                generate an action distribution object from, and
-                internal-state outputs (or an empty list if not applicable).
+            action_distribution_fn: A callable returning distribution inputs
+                (parameters), a dist-class to generate an action distribution


hmm, personally if I am a new user, I would be quite confused about the relationship between action_sampler_fn and action_distribution_fn.
I think I understand action sampling fn, but what does action distribution fn do?

Very valid point! I'll try to clarify and also mention that only one of these should be defined.

nice, this is much better.

gjoliver · 2021-10-27T06:05:52Z

rllib/policy/tf_policy.py

-            seq_lens (Optional[TensorType]): Placeholder for RNN sequence
-                lengths, of shape [NUM_SEQUENCES].
+            model: The optional ModelV2 to use for calculating actions and
+                losses.


can we mention what model will get used if not specified?

gjoliver · 2021-10-27T06:08:12Z

rllib/policy/tf_policy.py

-        """Returns whether the loss term(s) have been initialized."""
-        return len(self._losses) > 0
-
-    def _initialize_loss(self, losses: List[TensorType],


no functional changes here?

Nope, just moving stuff around to sort methods better by importance. Private/deprecated/helpers move more to the end.

sven1977 · 2021-10-27T10:26:05Z

Hey @gjoliver thanks for the review. Addressed all requests, could you take another look?

gjoliver

I think this looks great!! thanks.

gjoliver · 2021-10-27T15:52:20Z

rllib/policy/dynamic_tf_policy.py

-                generate an action distribution object from, and
-                internal-state outputs (or an empty list if not applicable).
+            action_distribution_fn: A callable returning distribution inputs
+                (parameters), a dist-class to generate an action distribution


nice, this is much better.

wip.

b45929d

sven1977 requested a review from gjoliver October 26, 2021 20:02

sven1977 assigned gjoliver Oct 26, 2021

gjoliver reviewed Oct 27, 2021

View reviewed changes

sven1977 added 2 commits October 27, 2021 11:05

LINT.

1062dcf

fixes.

4209486

gjoliver reviewed Oct 27, 2021

View reviewed changes

gjoliver approved these changes Oct 27, 2021

View reviewed changes

sven1977 merged commit f2cb2ed into ray-project:master Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759

[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759

sven1977 commented Oct 26, 2021 •

edited

Loading

gjoliver left a comment

gjoliver Oct 27, 2021

sven1977 Oct 27, 2021

gjoliver Oct 27, 2021

sven1977 Oct 27, 2021

sven1977 Oct 27, 2021

gjoliver Oct 27, 2021

gjoliver Oct 27, 2021

sven1977 Oct 27, 2021

sven1977 Oct 27, 2021

gjoliver Oct 27, 2021

sven1977 Oct 27, 2021

sven1977 Oct 27, 2021

gjoliver Oct 27, 2021

gjoliver Oct 27, 2021

sven1977 Oct 27, 2021

gjoliver Oct 27, 2021

sven1977 Oct 27, 2021

sven1977 commented Oct 27, 2021

gjoliver left a comment

gjoliver Oct 27, 2021

[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759

[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759

Conversation

sven1977 commented Oct 26, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Oct 27, 2021

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Oct 26, 2021 •

edited

Loading