[rllib] move evaluation to trainer.step() such that the result is properly logged #12708

Maltimore · 2020-12-09T11:48:26Z

Move evaluation to trainer.step() such that the result is properly logged even when training with trainer.train() instead of tune.run()

Why are these changes needed?

When training with trainer.train() instead of tune.run(), the results of the evaluation (evaluation metrics) are not written to disk (e. g. to the progress.csv).

This moves the evaluation code to trainer.step(), which ensures that the evaluation metrics are included in progress.csv and other output files.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR. --> yes but this errors with a flake8 error unrelated to the code
I've included any doc changes needed for https://docs.ray.io/en/master/. --> no doc changes needed
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

I only tested this PR in my own experiments.

sven1977

Looks fine for me.
@Maltimore Could you fix the broken eval test case and run the ci/travis/format.sh (LINT) before pushing?
Thanks!

Maltimore · 2021-01-07T14:42:28Z

I ran ci/travis/format.sh again and it works now (previously it failed with a flake8 error).

~~Can you tell me how to run the unit tests for rllib? Or, where can I find the information on how to run the unit tests?~~
edit: I figured out how to do the eval test now.

…gged even when training with trainer.train() instead of tune.run() lint bugfix: need to increment self._iteration in if condition [rllib] evaluation: simplify logic in if-condition

Maltimore · 2021-01-14T11:24:05Z

@sven1977
could you take another look? I simplified the logic in the if condition a bit (I hope I didn't mess it up, maybe if you could check the if condition as well that would be good). The evaluation test case runs through now.

rllib/agents/trainer_template.py

sven1977

Thanks for this fix @Maltimore !
Looks good to me.

…perly logged (ray-project#12708)

…t is properly logged (ray-project#12708)" This reverts commit 59010e2.

edoakes assigned sven1977 Dec 10, 2020

sven1977 requested changes Dec 21, 2020

View reviewed changes

sven1977 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 21, 2020

Maltimore force-pushed the eval_fix branch from 6f6dfec to 99f841f Compare January 13, 2021 15:26

move evaluation to trainer.step() such that the result is properly lo…

59ef86b

…gged even when training with trainer.train() instead of tune.run() lint bugfix: need to increment self._iteration in if condition [rllib] evaluation: simplify logic in if-condition

Maltimore force-pushed the eval_fix branch from 99f841f to 59ef86b Compare January 13, 2021 16:09

sven1977 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Jan 25, 2021

sven1977 reviewed Jan 25, 2021

View reviewed changes

rllib/agents/trainer_template.py Outdated Show resolved Hide resolved

Apply suggestions from code review

27f1b6d

sven1977 approved these changes Jan 25, 2021

View reviewed changes

sven1977 merged commit b4702de into ray-project:master Jan 25, 2021

fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021

[RLlib] move evaluation to trainer.step() such that the result is pro…

59010e2

…perly logged (ray-project#12708)

fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021

Revert "[RLlib] move evaluation to trainer.step() such that the resul…

28e9225

…t is properly logged (ray-project#12708)" This reverts commit 59010e2.

fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021

Revert "[RLlib] move evaluation to trainer.step() such that the resul…

87eb4a6

…t is properly logged (ray-project#12708)" This reverts commit 59010e2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] move evaluation to trainer.step() such that the result is properly logged #12708

[rllib] move evaluation to trainer.step() such that the result is properly logged #12708

Maltimore commented Dec 9, 2020

sven1977 left a comment

Maltimore commented Jan 7, 2021 •

edited

Loading

Maltimore commented Jan 14, 2021

sven1977 left a comment

[rllib] move evaluation to trainer.step() such that the result is properly logged #12708

[rllib] move evaluation to trainer.step() such that the result is properly logged #12708

Conversation

Maltimore commented Dec 9, 2020

Why are these changes needed?

Related issue number

Checks

sven1977 left a comment

Choose a reason for hiding this comment

Maltimore commented Jan 7, 2021 • edited Loading

Maltimore commented Jan 14, 2021

sven1977 left a comment

Choose a reason for hiding this comment

Maltimore commented Jan 7, 2021 •

edited

Loading