Sampler should not flatten observations and actions #967

ahtsan · 2019-10-29T05:04:30Z

To make API consistent among all samplers, samplers should not flatten the observation nor the actions, which is done by algorithms.

codecov · 2019-10-29T06:02:05Z

Codecov Report

Merging #967 into master will increase coverage by 0.09%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
+ Coverage   84.49%   84.58%   +0.09%     
==========================================
  Files         156      156              
  Lines        7467     7468       +1     
  Branches      938      938              
==========================================
+ Hits         6309     6317       +8     
+ Misses        966      964       -2     
+ Partials      192      187       -5

Impacted Files	Coverage Δ
src/garage/sampler/utils.py	`82.35% <100%> (+10.35%)`	⬆️
src/garage/np/policies/base.py	`85% <100%> (ø)`	⬆️
...rc/garage/sampler/off_policy_vectorized_sampler.py	`100% <0%> (+1.17%)`	⬆️
.../exploration_strategies/epsilon_greedy_strategy.py	`100% <0%> (+3.7%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 393e9c2...8789f0b. Read the comment docs.

ryanjulian · 2019-10-29T16:35:14Z

src/garage/sampler/utils.py

@@ -111,6 +115,6 @@ def truncate_paths(paths, max_samples):
                truncated_last_path[k] = tensor_utils.truncate_tensor_dict(
                    v, truncated_len)
            else:
-                raise NotImplementedError
+                raise NotImplementedError()


can you raise a ValueError instead? and include an error message which specifies the valid keys versus the ones found?

naeioi · 2019-10-29T22:43:11Z

Not directly related to this PR, but there is also inconsistency in flattening in primitives. For example, continuous_mlp_policy_with_model.py flattens observation

garage/src/garage/tf/policies/continuous_mlp_policy.py

Line 144 in dbec2b6

flat_obs = self.observation_space.flatten_n(observations)

while categorical_cnn_policy.py does not

garage/src/garage/tf/policies/categorical_cnn_policy.py

Line 144 in dbec2b6

probs = self._f_prob(observations)

To make things even worse, there is a flatten_input argument in batch_polopt.py

garage/src/garage/tf/algos/batch_polopt.py

Line 145 in dbec2b6

if self.flatten_input:

We should make all these into a single switch for flattening.

ryanjulian · 2019-10-30T00:04:31Z

@naeioi thanks for pointing this out.

i think that's a bug in ContinuousMLPPolicy then -- for now we have decided not to flatten in the primitives.

@ahtsan what do you think?

ahtsan · 2019-10-30T01:26:57Z

@naeioi @ryanjulian Yes I think right now flattening is duplicate in both primitive and algorithm - we should probably only do that in one place (which I think we chose to be algorithm). That said, we should remove the flattening in all primitives and add flattening support in all other algorithms (except batch_polopt) as well.

Regarding categorical_cnn_policy.py, we don't want to flatten the observations since we want to keep spatial information for CNN primitives. In that case, I think we could achieve the same goal by pass flatten_input=False to algorithm when using CNN primitives. Does that sound right?

ryanjulian · 2019-10-30T01:35:50Z

@ahtsan that sounds right to me. for now let's just make it consistent and we can consider more automatic designs later.

ryanjulian · 2019-11-01T19:00:46Z

I added some tests to try to merge this for the release, but it turns out something is still quite broken (or my tests are just wrong).

In any case, this won't make it and we will have to backport the fix I think.

@ahtsan @krzentner should be interested.

ahtsan · 2019-11-01T19:42:25Z

I added some tests to try to merge this for the release, but it turns out something is still quite broken (or my tests are just wrong).

In any case, this won't make it and we will have to backport the fix I think.

@ahtsan @krzentner should be interested.

I think the issue is that the dummy policy doesn't have dist_info_keys to extract agent_info, and the test assumes "env_info is not empty (which is not true in reality) and agent_info is not empty". We should probably fix the dummy policy and make them inherit the policy base class to make sure they comply with the APIs. We might also want to introduce env.env_info_keys so we can extract that, similarly to dist_info_keys.

I am working on a fix.

ryanjulian · 2019-11-01T19:52:15Z

If you look at the broken test, I think the shape of path['observations'] is wrong for a non-flat observation space.

ahtsan · 2019-11-01T19:54:38Z

If you look at the broken test, I think the shape of path['observations'] is wrong for a non-flat observation space.

Do you mean this test doesn't pass?

    def test_does_not_flatten(self):
        path = utils.rollout(self.env, self.policy, max_path_length=5)
        assert path['observations'][0].shape == (4, 4)
        assert path['actions'][0].shape == (2, 2)

ryanjulian · 2019-11-01T20:47:11Z

Yes

ahtsan requested review from ryanjulian, naeioi, zequnyu and krzentner October 29, 2019 05:04

ahtsan requested a review from a team as a code owner October 29, 2019 05:04

ahtsan force-pushed the fix_batch_sampler branch from 6a4f23d to 2582809 Compare October 29, 2019 14:40

ryanjulian reviewed Oct 29, 2019

View reviewed changes

ryanjulian approved these changes Oct 29, 2019

View reviewed changes

zequnyu approved these changes Oct 29, 2019

View reviewed changes

naeioi approved these changes Oct 29, 2019

View reviewed changes

ryanjulian approved these changes Nov 1, 2019

View reviewed changes

krzentner approved these changes Nov 1, 2019

View reviewed changes

ryanjulian force-pushed the fix_batch_sampler branch from 2582809 to 10f7c4f Compare November 1, 2019 19:08

ryanjulian added the backport-to-2019.10 Backport this PR to release-2019.10 label Nov 1, 2019

ahtsan force-pushed the fix_batch_sampler branch from 0bfcc0e to 1ca6f8d Compare November 1, 2019 22:13

ryanjulian approved these changes Nov 1, 2019

View reviewed changes

ahtsan force-pushed the fix_batch_sampler branch from 1ca6f8d to 88727b5 Compare November 1, 2019 22:44

ryanjulian removed the backport-to-2019.10 Backport this PR to release-2019.10 label Nov 1, 2019

ahtsan added 2 commits November 1, 2019 18:36

Make API consistent

c9c80e0

Fix test

d1cb232

Fix pylint

8789f0b

ahtsan force-pushed the fix_batch_sampler branch from 186eaaa to 8789f0b Compare November 2, 2019 01:37

ahtsan merged commit 93b1a48 into master Nov 2, 2019

ahtsan deleted the fix_batch_sampler branch November 2, 2019 04:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampler should not flatten observations and actions #967

Sampler should not flatten observations and actions #967

ahtsan commented Oct 29, 2019

codecov bot commented Oct 29, 2019 •

edited

Loading

ryanjulian Oct 29, 2019

naeioi commented Oct 29, 2019

ryanjulian commented Oct 30, 2019

ahtsan commented Oct 30, 2019 •

edited

Loading

ryanjulian commented Oct 30, 2019

ryanjulian commented Nov 1, 2019

ahtsan commented Nov 1, 2019 •

edited

Loading

ryanjulian commented Nov 1, 2019

ahtsan commented Nov 1, 2019

ryanjulian commented Nov 1, 2019

Sampler should not flatten observations and actions #967

Sampler should not flatten observations and actions #967

Conversation

ahtsan commented Oct 29, 2019

codecov bot commented Oct 29, 2019 • edited Loading

Codecov Report

ryanjulian Oct 29, 2019

Choose a reason for hiding this comment

naeioi commented Oct 29, 2019

ryanjulian commented Oct 30, 2019

ahtsan commented Oct 30, 2019 • edited Loading

ryanjulian commented Oct 30, 2019

ryanjulian commented Nov 1, 2019

ahtsan commented Nov 1, 2019 • edited Loading

ryanjulian commented Nov 1, 2019

ahtsan commented Nov 1, 2019

ryanjulian commented Nov 1, 2019

codecov bot commented Oct 29, 2019 •

edited

Loading

ahtsan commented Oct 30, 2019 •

edited

Loading

ahtsan commented Nov 1, 2019 •

edited

Loading