New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a typo to allow evaluating algos deterministically #1617
Conversation
Oh, I thought we had a test case for this code path. I guess not. Honestly, looking for mean in the first place is kinda a hack. It only really makes sense for gaussian policies. I've wondered for some time if we should add a argument to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this!
We should probably add a corresponding test for this bug somewhere in this file: https://github.com/rlworkgroup/garage/blob/master/tests/garage/sampler/test_utils.py |
@Mergifyio rebase |
Command
|
@maciejwolczyk thanks so much for the PR! Can you add a test which would have detected this bug (as suggested by @avnishn )? After that, I think it's ready to merge. |
I'm glad I could help and sorry for the delay! Is this test okay? I've run the test on the old code with the typo and it does fail, and of course it should pass after the fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test looks good to me! please just make the CI pass and we will merge it :)
thank you!
@Mergifyio rebase |
Command
|
@Mergifyio rebase |
Command |
@Mergifyio rebase |
Command |
@maciejwolczyk it looks like the CI is failing due to a formatting issue. Can you correct it and re-upload? Note that this should be handled for you if you setup pre-commit, as detailed in CONTRIBUTING.md |
That's a bit weird, earlier I've checked the commits with the pre-commit hooks enabled and also the CI passed when it was ran for the first time... I guess maybe the failure now is because the pre-commit script changed recently? I've pushed a commit which passes the pre-commit scripts. It was actually detecting a formatting error in lines that I haven't touched, so I guess it just checks the whole file. If you run Anyway, I hope it will pass now. |
It's failing now, but on a later stage (normal tests) and the failing test is |
Hi, just sent #1710 to fix this failing test. Once this is merged please try rebasing the latest master branch and check CI again. |
@Mergifyio rebase |
@maciejwolczyk is not allowed to run commands |
@Mergifyio rebase |
Command |
@Mergifyio backport release-2020.06 |
@Mergifyio backport release-2019.10 |
Command
|
Command
|
* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init
* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init
When running some experiments with SAC I have discovered that my algorithm does not act deterministically during the evaluation (i.e. the action is sampled from the policy distribution instead of taking the mean/mode of the distribution). The code for obtaining evaluation samples uses the
rollout
function from sampler.utils with the argumentdeterministic=True
.The rollout function is then supposed to look into
agent_info
dictionary and use themean
value stored there. Unfortunately, currently in the code it looks into theagent_infos
(withs
at the end), which is a list containingagent_info
dictionaries and as such obviously does not contain themean
key. This means that the stochastic, sampled action is used instead. My pull request solves this issue by fixing the typo.Technical sidenote - maybe there should be an exception raised if
deterministic=True
and there is nomean
key in the dict?