Fix a typo to allow evaluating algos deterministically #1617

maciejwolczyk · 2020-06-25T17:29:50Z

When running some experiments with SAC I have discovered that my algorithm does not act deterministically during the evaluation (i.e. the action is sampled from the policy distribution instead of taking the mean/mode of the distribution). The code for obtaining evaluation samples uses the rollout function from sampler.utils with the argument deterministic=True.

The rollout function is then supposed to look into agent_info dictionary and use the mean value stored there. Unfortunately, currently in the code it looks into the agent_infos (with s at the end), which is a list containing agent_info dictionaries and as such obviously does not contain the mean key. This means that the stochastic, sampled action is used instead. My pull request solves this issue by fixing the typo.

Technical sidenote - maybe there should be an exception raised if deterministic=True and there is no mean key in the dict?

krzentner · 2020-06-25T21:18:00Z

Oh, I thought we had a test case for this code path. I guess not.

Honestly, looking for mean in the first place is kinda a hack. It only really makes sense for gaussian policies. I've wondered for some time if we should add a argument to get_action/get_actions that makes the actions deterministic.

avnishn

Thanks for catching this!

avnishn · 2020-06-29T19:00:36Z

We should probably add a corresponding test for this bug somewhere in this file:

https://github.com/rlworkgroup/garage/blob/master/tests/garage/sampler/test_utils.py

ryanjulian · 2020-06-30T22:22:57Z

@Mergifyio rebase

mergify · 2020-06-30T22:23:34Z

Command rebase: success

Branch has been successfully rebased

ryanjulian · 2020-06-30T22:23:39Z

@maciejwolczyk thanks so much for the PR!

Can you add a test which would have detected this bug (as suggested by @avnishn )?

After that, I think it's ready to merge.

maciejwolczyk · 2020-07-02T16:14:31Z

I'm glad I could help and sorry for the delay!

Is this test okay? I've run the test on the old code with the typo and it does fail, and of course it should pass after the fix.

tests/fixtures/policies/dummy_policy.py

ryanjulian

this test looks good to me! please just make the CI pass and we will merge it :)

thank you!

ryanjulian · 2020-07-02T18:44:10Z

@Mergifyio rebase

mergify · 2020-07-02T18:44:56Z

Command rebase: success

Branch has been successfully rebased

ryanjulian · 2020-07-03T19:04:33Z

@Mergifyio rebase

mergify · 2020-07-03T19:05:26Z

Command rebase: success

ryanjulian · 2020-07-05T17:16:58Z

@Mergifyio rebase

mergify · 2020-07-05T17:17:43Z

Command rebase: success

ryanjulian · 2020-07-05T17:50:15Z

@maciejwolczyk it looks like the CI is failing due to a formatting issue. Can you correct it and re-upload?

Note that this should be handled for you if you setup pre-commit, as detailed in CONTRIBUTING.md

maciejwolczyk · 2020-07-05T19:04:56Z

That's a bit weird, earlier I've checked the commits with the pre-commit hooks enabled and also the CI passed when it was ran for the first time... I guess maybe the failure now is because the pre-commit script changed recently?

I've pushed a commit which passes the pre-commit scripts. It was actually detecting a formatting error in lines that I haven't touched, so I guess it just checks the whole file. If you run yapf --recursive . in the repository dir then it finds some more formatting inconsistencies, but I think a separate pull request would be more appropriate for fixing that.

Anyway, I hope it will pass now.

maciejwolczyk · 2020-07-06T11:06:30Z

It's failing now, but on a later stage (normal tests) and the failing test is test_time_limit_env for the BulletEnv. I'm not sure if this issue is connected with the current pull request, I've checked that some of the CI for different PRs fail on this test as well.

AiRuiChen · 2020-07-06T16:43:11Z

It's failing now, but on a later stage (normal tests) and the failing test is test_time_limit_env for the BulletEnv. I'm not sure if this issue is connected with the current pull request, I've checked that some of the CI for different PRs fail on this test as well.

Hi, just sent #1710 to fix this failing test. Once this is merged please try rebasing the latest master branch and check CI again.

maciejwolczyk · 2020-07-06T20:04:27Z

@Mergifyio rebase

mergify · 2020-07-06T20:04:59Z

@maciejwolczyk is not allowed to run commands

ryanjulian · 2020-07-06T21:04:29Z

@Mergifyio rebase

mergify · 2020-07-06T21:05:20Z

Command rebase: success

ryanjulian · 2020-07-07T00:13:16Z

@Mergifyio backport release-2020.06

ryanjulian · 2020-07-07T00:13:27Z

@Mergifyio backport release-2019.10

mergify · 2020-07-07T00:13:55Z

Command backport release-2020.06: failure

No backport have been created

Backport to branch release-2020.06 failed

Cherry-pick of f8377f5 has failed:

On branch mergify/bp/release-2020.06/pr-1617
Your branch is up to date with 'origin/release-2020.06'.

You are currently cherry-picking commit f8377f5f.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:

	modified:   src/garage/sampler/utils.py
	modified:   tests/garage/sampler/test_utils.py

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	both modified:   tests/fixtures/policies/dummy_policy.py

mergify · 2020-07-07T00:14:02Z

Command backport release-2019.10: failure

No backport have been created

Backport to branch release-2019.10 failed

Cherry-pick of f8377f5 has failed:

On branch mergify/bp/release-2019.10/pr-1617
Your branch is up to date with 'origin/release-2019.10'.

You are currently cherry-picking commit f8377f5f.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:

	modified:   tests/garage/sampler/test_utils.py

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	both modified:   src/garage/sampler/utils.py
	both modified:   tests/fixtures/policies/dummy_policy.py

* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init

* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init Co-authored-by: Maciej Wołczyk <raihid888@gmail.com>

* Backport #1617 * Fix docstring * Fix test_off_policy_vec_sampler_integration Co-authored-by: ruofu <ruofuwan@usc.edu>

* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init Co-authored-by: Maciej Wołczyk <raihid888@gmail.com>

maciejwolczyk requested a review from a team as a code owner June 25, 2020 17:29

maciejwolczyk requested a review from ahtsan June 25, 2020 17:29

ahtsan requested review from krzentner and ryanjulian June 25, 2020 18:10

maciejwolczyk force-pushed the master branch from 5869e6a to 42748ab Compare June 25, 2020 18:46

krzentner approved these changes Jun 25, 2020

View reviewed changes

krzentner added the backport-to-2019.10 Backport this PR to release-2019.10 label Jun 25, 2020

ryanjulian added the backport-to-2020.06 Backport this PR to release-2020.06 label Jun 27, 2020

ryanjulian force-pushed the master branch from 2e825f8 to d5220d3 Compare June 28, 2020 01:00

avnishn approved these changes Jun 29, 2020

View reviewed changes

ryanjulian force-pushed the master branch from 42748ab to 4a41aaa Compare June 30, 2020 22:23

ryanjulian added this to In Progress in v2019.10 backports via automation Jul 2, 2020

ryanjulian added this to TODO in v2020.06 backports via automation Jul 2, 2020

ryanjulian reviewed Jul 2, 2020

View reviewed changes

tests/fixtures/policies/dummy_policy.py Show resolved Hide resolved

ryanjulian approved these changes Jul 2, 2020

View reviewed changes

v2020.06 backports automation moved this from TODO to In Progress Jul 2, 2020

ryanjulian added the ready-to-merge label Jul 2, 2020

ahtsan force-pushed the master branch from ad8b905 to 64e9eda Compare July 2, 2020 18:44

ahtsan force-pushed the master branch from 64e9eda to 964caa4 Compare July 3, 2020 19:05

mergify bot requested a review from a team July 3, 2020 19:06

mergify bot requested a review from a team July 3, 2020 19:12

ahtsan force-pushed the master branch from 964caa4 to dbe99bc Compare July 5, 2020 17:17

ryanjulian mentioned this pull request Jul 5, 2020

Release v2020.06.2 #1709

Closed

5 tasks

maciejwolczyk added 3 commits July 6, 2020 21:05

Fix deterministic policy evaluation

bf72338

Add tests for deterministic policy eval

793b2a3

Fix formatting in dummy policy init

0a5f55b

ahtsan force-pushed the master branch from 964f0ae to 0a5f55b Compare July 6, 2020 21:05

ryanjulian merged commit f8377f5 into rlworkgroup:master Jul 6, 2020

v2019.10 backports automation moved this from In Progress to Done Jul 6, 2020

v2020.06 backports automation moved this from In Progress to Done Jul 6, 2020

ryanjulian pushed a commit that referenced this pull request Jul 7, 2020

Fix a typo to allow evaluating algos deterministically (#1617)

ec18bd5

* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init

ryanjulian pushed a commit that referenced this pull request Jul 7, 2020

Fix a typo to allow evaluating algos deterministically (#1617)

cae3355

* Fix deterministic policy evaluation * Add tests for deterministic policy eval * Fix formatting in dummy policy init

yeukfu added a commit that referenced this pull request Aug 5, 2020

Backport #1617

bee80ca

mergify bot pushed a commit that referenced this pull request Aug 5, 2020

Backport #1617 to release-2019.10 (#1715)

8494a6b

* Backport #1617 * Fix docstring * Fix test_off_policy_vec_sampler_integration Co-authored-by: ruofu <ruofuwan@usc.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a typo to allow evaluating algos deterministically #1617

Fix a typo to allow evaluating algos deterministically #1617

maciejwolczyk commented Jun 25, 2020

krzentner commented Jun 25, 2020

avnishn left a comment

avnishn commented Jun 29, 2020 •

edited

ryanjulian commented Jun 30, 2020

mergify bot commented Jun 30, 2020

ryanjulian commented Jun 30, 2020

maciejwolczyk commented Jul 2, 2020

ryanjulian left a comment

ryanjulian commented Jul 2, 2020

mergify bot commented Jul 2, 2020

ryanjulian commented Jul 3, 2020

mergify bot commented Jul 3, 2020

ryanjulian commented Jul 5, 2020

mergify bot commented Jul 5, 2020

ryanjulian commented Jul 5, 2020

maciejwolczyk commented Jul 5, 2020 •

edited

maciejwolczyk commented Jul 6, 2020

AiRuiChen commented Jul 6, 2020

maciejwolczyk commented Jul 6, 2020

mergify bot commented Jul 6, 2020

ryanjulian commented Jul 6, 2020

mergify bot commented Jul 6, 2020

ryanjulian commented Jul 7, 2020

ryanjulian commented Jul 7, 2020

mergify bot commented Jul 7, 2020

mergify bot commented Jul 7, 2020

Fix a typo to allow evaluating algos deterministically #1617

Fix a typo to allow evaluating algos deterministically #1617

Conversation

maciejwolczyk commented Jun 25, 2020

krzentner commented Jun 25, 2020

avnishn left a comment

Choose a reason for hiding this comment

avnishn commented Jun 29, 2020 • edited

ryanjulian commented Jun 30, 2020

mergify bot commented Jun 30, 2020

ryanjulian commented Jun 30, 2020

maciejwolczyk commented Jul 2, 2020

ryanjulian left a comment

Choose a reason for hiding this comment

ryanjulian commented Jul 2, 2020

mergify bot commented Jul 2, 2020

ryanjulian commented Jul 3, 2020

mergify bot commented Jul 3, 2020

ryanjulian commented Jul 5, 2020

mergify bot commented Jul 5, 2020

ryanjulian commented Jul 5, 2020

maciejwolczyk commented Jul 5, 2020 • edited

maciejwolczyk commented Jul 6, 2020

AiRuiChen commented Jul 6, 2020

maciejwolczyk commented Jul 6, 2020

mergify bot commented Jul 6, 2020

ryanjulian commented Jul 6, 2020

mergify bot commented Jul 6, 2020

ryanjulian commented Jul 7, 2020

ryanjulian commented Jul 7, 2020

mergify bot commented Jul 7, 2020

mergify bot commented Jul 7, 2020

avnishn commented Jun 29, 2020 •

edited

maciejwolczyk commented Jul 5, 2020 •

edited