-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Metrics do-over 01: Introducing and testing new MetricsLogger and Stats APIs. #44442
[RLlib] Metrics do-over 01: Introducing and testing new MetricsLogger and Stats APIs. #44442
Conversation
…lete_metrics_and_stats_do_over
…lete_metrics_and_stats_do_over
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <svenmika1977@gmail.com>
…lete_metrics_and_stats_do_over
…lete_metrics_and_stats_do_over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great and very appreciated PR. Some nits here and there and a couple of questions. What is missing imo is to show the basic structure of the ResultDict
for EnvRunner
, Learner
, etc. and for the overall ResultDict
containing all of them. This is important when putting custom metrics into order - specifically, if a user wants to log them to a specific chart in TensorBoard/WandB.
else: | ||
for batch_or_episode in sampled_data: | ||
if max_agent_steps: | ||
agent_or_env_steps += ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this, if we return metrics? ANd when don't we want to retrun metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unfortunately, we do here in this case, b/c we have to determine right here (before even logging the actual steps to the Algorithm's metrics) when to stop the while loop.
We might rearrange this entire utility, but for now, it works just fine.
…lete_metrics_and_stats_do_over
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…lete_metrics_and_stats_do_over
…lete_metrics_and_stats_do_over
…lete_metrics_and_stats_do_over
…lete_metrics_and_stats_do_over
ResultsDict
do-over.…lete_metrics_and_stats_do_over Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/utils/test_utils.py
RLlib Metrics, custom user metrics, and
ResultsDict
do-over for new API stack:This PR introduces a new unified API (
MetricsLogger
) for both RLlib's core codebase and its users to log (custom) metrics from within any(!) component. Thus, whether you are insideAlgorithm
(callback, overridingtraining_step
, custom eval function, etc..), EnvRunner (callback), or Learner (custom loss?), you can now use the exact same API in all these components for logging your custom metrics.The new MetricsLogger, which is held by all the above components under the
self.metrics
property, exposes the following simple API to log stats values:There are two situations during which all logged values thus-far will be "reduced" or "merged":
At the end of a component's cycle. For example, if you call
EnvRunner.sample()
, at the end of this call, the EnvRunner will call thereduce()
method on itsMetricsLogger
object and return the results. Note that this does not necessarily mean that all historic data is reduced at this time. If - for example - you have a stat under the "abc" key withwindow=1000
and theEnvRunner
only logged 50 new values during thesample()
call, the previously logged 950 values will still remain in the cache under that key.After n parallel components (e.g. n EnvRunners) have returned their reduced results, the controlling component (e.g. Algorithm object controlling the n remote EnvRunners) will have to merge the n received result dicts.
This can be achieved with the MetricsLogger of the controlling component:
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.