Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Docs overhaul] Docstring cleanup: Policies, policy_templates. #19759

Merged
merged 3 commits into from
Oct 27, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 26, 2021

Docstring cleanup: Policies, policy_templates.

  • Cleanup docstrings.
  • Remove type hints in Args list (already in the signature).
  • Remve type hint from "Returns" (already in signature).
  • Add docstrings where missing.
  • Re-sort some class methods (by their importance, private/public, deprecated, etc..).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, some suggestions for your consideration.
thanks.

@@ -88,57 +72,53 @@ def __init__(
obs_include_prev_action_reward=DEPRECATED_VALUE):
"""Initializes a DynamicTFPolicy instance.

Initialization oxf this class occurs in two phases and defines the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo /of/oxf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

action_space: Action space of the policy.
config: Policy-specific configuration data.
loss_fn: Function that returns a loss tensor for the policy graph.
stats_fn: Optional function that returns a dict of TF fetches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "TF fetches" mean? I thought we can export any numeric stats here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for TFPolicy. The stats fn here has to return static graph tensors, just like e.g. the loss function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added explanation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, that's right.

loss_fn: Function that returns a loss tensor for the policy graph.
stats_fn: Optional function that returns a dict of TF fetches
given the policy and batch input tensors.
grad_stats_fn: Optional function that returns a dict of TF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we explain the difference between stats_fn and grad_stats _fn please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

generate an action distribution object from, and
internal-state outputs (or an empty list if not applicable).
action_distribution_fn: A callable returning distribution inputs
(parameters), a dist-class to generate an action distribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, personally if I am a new user, I would be quite confused about the relationship between action_sampler_fn and action_distribution_fn.
I think I understand action sampling fn, but what does action distribution fn do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very valid point! I'll try to clarify and also mention that only one of these should be defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is much better.

seq_lens (Optional[TensorType]): Placeholder for RNN sequence
lengths, of shape [NUM_SEQUENCES].
model: The optional ModelV2 to use for calculating actions and
losses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we mention what model will get used if not specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

"""Returns whether the loss term(s) have been initialized."""
return len(self._losses) > 0

def _initialize_loss(self, losses: List[TensorType],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no functional changes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, just moving stuff around to sort methods better by importance. Private/deprecated/helpers move more to the end.

@sven1977
Copy link
Contributor Author

Hey @gjoliver thanks for the review. Addressed all requests, could you take another look?

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great!! thanks.

generate an action distribution object from, and
internal-state outputs (or an empty list if not applicable).
action_distribution_fn: A callable returning distribution inputs
(parameters), a dist-class to generate an action distribution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is much better.

@sven1977 sven1977 merged commit f2cb2ed into ray-project:master Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants