Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] move evaluation to trainer.step() such that the result is properly logged #12708

Merged
merged 2 commits into from
Jan 25, 2021

Conversation

Maltimore
Copy link
Contributor

Move evaluation to trainer.step() such that the result is properly logged even when training with trainer.train() instead of tune.run()

Why are these changes needed?

When training with trainer.train() instead of tune.run(), the results of the evaluation (evaluation metrics) are not written to disk (e. g. to the progress.csv).

This moves the evaluation code to trainer.step(), which ensures that the evaluation metrics are included in progress.csv and other output files.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR. --> yes but this errors with a flake8 error unrelated to the code
  • I've included any doc changes needed for https://docs.ray.io/en/master/. --> no doc changes needed
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

I only tested this PR in my own experiments.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine for me.
@Maltimore Could you fix the broken eval test case and run the ci/travis/format.sh (LINT) before pushing?
Thanks!

@sven1977 sven1977 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 21, 2020
@Maltimore
Copy link
Contributor Author

Maltimore commented Jan 7, 2021

I ran ci/travis/format.sh again and it works now (previously it failed with a flake8 error).

Can you tell me how to run the unit tests for rllib? Or, where can I find the information on how to run the unit tests?
edit: I figured out how to do the eval test now.

…gged even when training with trainer.train() instead of tune.run()

lint

bugfix: need to increment self._iteration in if condition

[rllib] evaluation: simplify logic in if-condition
@Maltimore
Copy link
Contributor Author

@sven1977
could you take another look? I simplified the logic in the if condition a bit (I hope I didn't mess it up, maybe if you could check the if condition as well that would be good). The evaluation test case runs through now.

@sven1977 sven1977 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Jan 25, 2021
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @Maltimore !
Looks good to me.

@sven1977 sven1977 merged commit b4702de into ray-project:master Jan 25, 2021
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants