Replace CategoricalConvPolicy #906

lywong92 · 2019-10-02T20:54:57Z

Remove all occurrences of CategoricalConvPolicy
Rename CategoricalConvPolicyWithModel to CategoricalConvPolicy
Create and remove integration test

Benchmark script is located in origin/benchmark_categorical_cnn_policy.

Results:

Also tried running both versions in an atari environment PongNoFrameskip with PPO, but realized that this combination of environment and algorithm is not ideal for our testing:

As discussed, results from MemorizeDigits are sufficient to show that the layer implementation can be replaced with the model implementation.

codecov · 2019-10-02T21:19:30Z

Codecov Report

Merging #906 into master will decrease coverage by 0.39%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master     #906     +/-   ##
=========================================
- Coverage   69.67%   69.28%   -0.4%     
=========================================
  Files         171      170      -1     
  Lines        9485     9434     -51     
  Branches     1250     1249      -1     
=========================================
- Hits         6609     6536     -73     
- Misses       2660     2681     +21     
- Partials      216      217      +1

Impacted Files	Coverage Δ
src/garage/tf/policies/categorical_cnn_policy.py	`96% <100%> (ø)`
src/garage/tf/core/network.py	`0% <0%> (-23.22%)`	⬇️
src/garage/envs/grid_world_env.py	`86.56% <0%> (-4.48%)`	⬇️
src/garage/experiment/experiment_wrapper.py	`84.84% <0%> (-0.76%)`	⬇️
src/garage/misc/special.py	`33.33% <0%> (+3.03%)`	⬆️
.../exploration_strategies/epsilon_greedy_strategy.py	`100% <0%> (+3.57%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4673a26...be4c045. Read the comment docs.

ahtsan

+1 for adding an example for CategoricalCNNPolicy. Please replace 'conv' with 'CNN' as it's the preferred convention before merging. Well done!

ahtsan · 2019-10-02T21:21:17Z

examples/tf/trpo_cubecrash.py

+    """Run task."""
+    with LocalTFRunner(snapshot_config=snapshot_config) as runner:
+        env = TfEnv(normalize(gym.make('CubeCrash-v0')))
+        policy = CategoricalConvPolicy(env_spec=env.spec,


Might be better to use CNN for convention, so it would be CategoricalCNNPolicy. Please also edit other occurrence.

ryanjulian · 2019-10-02T21:30:13Z

Can you run benchmarks using the following environments:

MemorizeDigits-v0
CubeCrash-v0
CarRacing-v0
Acrobot-v1^
MountainCar-v0^
CartPole-v1^
LunarLander-v2^

^ - Using the wrappers PixelObservationWrapper and FrameStack (n=4)

I omitted the Sparse and ScreenBecomesBlack variants of CubeCrash because those are meant to test off-policy algorithms and RNNs respectively.

First list found using this script:

> import gym
> from gym.spaces import Box
> targets =  [s for s in gym.envs.registry.all() if s._entry_point != 'gym.envs.atari:AtariEnv' and type(s.make().observation_space) == Box and len(s.make().observation_space.shape) == 3]
> print(targets)
[EnvSpec(CubeCrashScreenBecomesBlack-v0),
 EnvSpec(CubeCrash-v0),
 EnvSpec(CarRacing-v0),
 EnvSpec(MemorizeDigits-v0),
 EnvSpec(CubeCrashSparse-v0)]

Second list using this additional query:

> from gym.spaces import Discrete
> targets = [s for s in gym.envs.registry.all() if s._entry_point != 'gym.envs.atari:AtariEnv' and type(s.make().action_space) == Discrete and 'rgb _array' in s.make().metadata['render.modes']]
> print(targets)
[EnvSpec(MountainCar-v0),
 EnvSpec(CubeCrashScreenBecomesBlack-v0),
 EnvSpec(CartPole-v1),
 EnvSpec(CartPole-v0),
 EnvSpec(CubeCrash-v0),
 EnvSpec(Acrobot-v1),
 EnvSpec(LunarLander-v2),
 EnvSpec(MemorizeDigits-v0),
 EnvSpec(CubeCrashSparse-v0)]

Along with your PR, can you add two examples (to examples/): one using one of the first set of environments, and another example with an environment using the wrappers? In your examples, you can also include the list of the other environments for which that example will work.
Please include your new benchmark script in benchmarks/, of course omitting the part of the benchmark which tests the now-deleted primitive.

ahtsan · 2019-10-02T21:33:30Z

Can you run benchmarks using the following environments:

MemorizeDigits-v0

CubeCrash-v0

CarRacing-v0

Acrobot-v1^

MountainCar-v0^

CartPole-v1^

LunarLander-v2^

^ - Using the wrappers PixelObservationWrapper and FrameStack (n=4)

I omitted the Sparse and ScreenBecomesBlack variants of CubeCrash because those are meant to test off-policy algorithms and RNNs respectively.

First list found using this script:
> import gym
> from gym.spaces import Box
> targets =  [s for s in gym.envs.registry.all() if s._entry_point != 'gym.envs.atari:AtariEnv' and type(s.make().observation_space) == Box and len(s.make().observation_space.shape) == 3]
> print(targets)
[EnvSpec(CubeCrashScreenBecomesBlack-v0),
 EnvSpec(CubeCrash-v0),
 EnvSpec(CarRacing-v0),
 EnvSpec(MemorizeDigits-v0),
 EnvSpec(CubeCrashSparse-v0)]
Second list using this additional query:
> from gym.spaces import Discrete
> targets = [s for s in gym.envs.registry.all() if s._entry_point != 'gym.envs.atari:AtariEnv' and type(s.make().action_space) == Discrete and 'rgb _array' in s.make().metadata['render.modes']]
> print(targets)
[EnvSpec(MountainCar-v0),
 EnvSpec(CubeCrashScreenBecomesBlack-v0),
 EnvSpec(CartPole-v1),
 EnvSpec(CartPole-v0),
 EnvSpec(CubeCrash-v0),
 EnvSpec(Acrobot-v1),
 EnvSpec(LunarLander-v2),
 EnvSpec(MemorizeDigits-v0),
 EnvSpec(CubeCrashSparse-v0)]
Along with your PR, can you add two examples (to examples/): one using one of the first set of environments, and another example with an environment using the wrappers? In your examples, you can also include the list of the other environments for which that example will work.

CarRacing won't work since it has continuous action space?

ryanjulian · 2019-10-02T22:16:36Z

@ahtsan gym documentation mentions that it can easily be controlled as discrete using 100% on/off actions.

krzentner · 2019-10-03T22:00:03Z

I think even a small number of the Atari environments should be enough? It would be very surprising if we saw a large regression on some but not most of them.

ryanjulian · 2019-10-04T00:59:27Z

@krzentner the issue is that PPO/on-policy algos can't train Atari in any reasonable number of steps, because of exploration (and this primitive is only applicable to on-policy algos).

krzentner · 2019-10-04T01:10:56Z

@krzentner the issue is that PPO/on-policy algos can't train Atari in any reasonable number of steps, because of exploration (and this primitive is only applicable to on-policy algos).

Ah, I see, my bad. The PPO paper does use Atari as a benchmark, but takes ~10M timesteps to train even the easiest envs.

The list you've given above is a good idea then (perhaps minus CarRacing, if that will take extra work).

ryanjulian · 2019-10-07T22:20:00Z

@lywong92 did you have a chance to run the additional benchmarks?

ahtsan · 2019-10-07T22:21:00Z

@lywong92 did you have a chance to run the additional benchmarks?

Extra effort is needed for getting discrete action for CarRacing environment.

ryanjulian · 2019-10-07T22:25:27Z

okay -- how much extra effort? if it's a lot, we can just test with the smaller set.

krzentner · 2019-10-08T00:03:53Z

If we had a ContinuousToDiscrete (or BangBangControl) env wrapper this would be trivial. Perhaps we should add one? It might actually be useful in other environments (especially with the right clipping options).

krzentner · 2019-10-08T00:16:14Z

After some discussion, implementing that wrapper correctly in a way that would interoperate with our policies seems difficult, so let's just skip CarRacing for now.

lywong92 · 2019-10-08T00:22:43Z

I ran CubeCrash-v0:

I had some trouble using the pixel wrapper in gym. I think that's solved but now I'm getting an error saying that the env can't be pickled. I wrapped the Acrobot env with PixelObservationWrapper, Grayscale, then StackFrames.

Do you happen to know if there's a way to resolve this?

ahtsan · 2019-10-08T01:15:54Z

I ran CubeCrash-v0:

I had some trouble using the pixel wrapper in gym. I think that's solved but now I'm getting an error saying that the env can't be pickled. I wrapped the Acrobot env with PixelObservationWrapper, Grayscale, then StackFrames.

Do you happen to know if there's a way to resolve this?

Upodate: the viewer object in gym.Env is not pickleable. We hot fixed it by setting it to None before the pickling happens since it will always be created if it's None (drawback is it might get slow down a bit) - we are moving forward now.

ryanjulian · 2019-10-08T18:38:20Z

re: the viewer -- this can be handled in GarageEnv. we already have special cases for closing the viewers (because gym doesn't do it properly)

ryanjulian · 2019-10-08T18:38:51Z

when you post your results, can you use the confidence interval plots introduced by Yong, Utkarsh, and Anson? They make results much easier to interpret.

ahtsan · 2019-10-08T21:51:29Z

re: the viewer -- this can be handled in GarageEnv. we already have special cases for closing the viewers (because gym doesn't do it properly)

The issue we are having is that the viewer is not pickleable and raise error when OnPolicyVectorizedSampler tries to pickle it when setting up the sampling. I think GarageEnv handles it when the environment is closed, which is a separate issue.

ryanjulian · 2019-10-08T23:27:18Z

@ahtsan i'm suggesting you can handle this in a similar way to how he handle closing environments.

lywong92 · 2019-10-09T01:34:12Z

I did a trial run of Acrobot-v1 with 100 iterations, and the average return never improved. I resized the image to a smaller dimension to make it run faster for the last run. I'm trying again without resizing and with more iterations.

I compared the original environment with the one after all the wrapping:

I'm wondering if the missing line would be a problem.

ryanjulian · 2019-10-09T18:34:53Z

@lywong92 what you done should be enough, thought 100 iterations is a really low number. resizing should make it go faster and not really affect performance. i wouldn't expect to see signs of life until ~1000 or more (assuming the batch size is 4000). perhaps try running this with PPO rather than TRPO, which might be kinder to the CNNs.

let's not get hung up trying to get the Pixel wrappers to work right now. the goal is to ensure that this primitive isn't any worse than before.

since it looks like CubeCrash and MemorizeDigits are the environments which work today, and they look good, i think this is ready to merge.

can you make sure to add your benchmark script before merging?

ahtsan · 2019-10-09T18:39:32Z

Another 10 runs for CubeCrash

lywong92 · 2019-10-09T21:01:10Z

@lywong92 what you done should be enough, thought 100 iterations is a really low number. resizing should make it go faster and not really affect performance. i wouldn't expect to see signs of life until ~1000 or more (assuming the batch size is 4000). perhaps try running this with PPO rather than TRPO, which might be kinder to the CNNs.

let's not get hung up trying to get the Pixel wrappers to work right now. the goal is to ensure that this primitive isn't any worse than before.

since it looks like CubeCrash and MemorizeDigits are the environments which work today, and they look good, i think this is ready to merge.

can you make sure to add your benchmark script before merging?

Okay, will do!

lywong92 · 2019-10-10T00:07:22Z

Sorry about the lag. I rebased and am seeing a possible bug when I run examples. Currently working on it and will merge as soon as I get that figured out.

ahtsan · 2019-10-10T00:08:35Z

This is the bug -- #794

- Remove all occurrences of CategoricalConvPolicy - Rename CategoricalConvPolicyWithModel to CategoricalConvPolicy - Create and remove integration test

lywong92 requested review from ryanjulian, ahtsan and krzentner October 2, 2019 20:54

lywong92 requested a review from a team as a code owner October 2, 2019 20:54

ahtsan approved these changes Oct 2, 2019

View reviewed changes

ahtsan mentioned this pull request Oct 2, 2019

Add benchmarking environments for discrete action space to README #912

Closed

ryanjulian approved these changes Oct 9, 2019

View reviewed changes

lywong92 force-pushed the replace_categorical_conv_policy branch from 7c20abf to 298f61e Compare October 9, 2019 23:56

lywong92 added 2 commits October 16, 2019 10:31

Replace CategoricalConvPolicy

9165b5f

- Remove all occurrences of CategoricalConvPolicy - Rename CategoricalConvPolicyWithModel to CategoricalConvPolicy - Create and remove integration test

Add examples and benchmark script

be4c045

ryanjulian force-pushed the replace_categorical_conv_policy branch from 298f61e to be4c045 Compare October 16, 2019 18:13

ryanjulian merged commit f299d3e into master Oct 16, 2019

ryanjulian deleted the replace_categorical_conv_policy branch October 16, 2019 20:59

This was referenced Oct 16, 2019

OnPolicyVectorizedSampler should not flatten pixel observations #794

Closed

Refactor CategoricalConvPolicy to use garage.tf.Model #519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace CategoricalConvPolicy #906

Replace CategoricalConvPolicy #906

lywong92 commented Oct 2, 2019

codecov bot commented Oct 2, 2019 •

edited

ahtsan left a comment

ahtsan Oct 2, 2019

ryanjulian commented Oct 2, 2019 •

edited

ahtsan commented Oct 2, 2019 •

edited

ryanjulian commented Oct 2, 2019

krzentner commented Oct 3, 2019

ryanjulian commented Oct 4, 2019

krzentner commented Oct 4, 2019

ryanjulian commented Oct 7, 2019

ahtsan commented Oct 7, 2019 •

edited

ryanjulian commented Oct 7, 2019

krzentner commented Oct 8, 2019

krzentner commented Oct 8, 2019

lywong92 commented Oct 8, 2019

ahtsan commented Oct 8, 2019

ryanjulian commented Oct 8, 2019

ryanjulian commented Oct 8, 2019

ahtsan commented Oct 8, 2019

ryanjulian commented Oct 8, 2019

lywong92 commented Oct 9, 2019

ryanjulian commented Oct 9, 2019 •

edited

ahtsan commented Oct 9, 2019

lywong92 commented Oct 9, 2019

lywong92 commented Oct 10, 2019

ahtsan commented Oct 10, 2019

Replace CategoricalConvPolicy #906

Replace CategoricalConvPolicy #906

Conversation

lywong92 commented Oct 2, 2019

codecov bot commented Oct 2, 2019 • edited

Codecov Report

ahtsan left a comment

Choose a reason for hiding this comment

ahtsan Oct 2, 2019

Choose a reason for hiding this comment

ryanjulian commented Oct 2, 2019 • edited

ahtsan commented Oct 2, 2019 • edited

ryanjulian commented Oct 2, 2019

krzentner commented Oct 3, 2019

ryanjulian commented Oct 4, 2019

krzentner commented Oct 4, 2019

ryanjulian commented Oct 7, 2019

ahtsan commented Oct 7, 2019 • edited

ryanjulian commented Oct 7, 2019

krzentner commented Oct 8, 2019

krzentner commented Oct 8, 2019

lywong92 commented Oct 8, 2019

ahtsan commented Oct 8, 2019

ryanjulian commented Oct 8, 2019

ryanjulian commented Oct 8, 2019

ahtsan commented Oct 8, 2019

ryanjulian commented Oct 8, 2019

lywong92 commented Oct 9, 2019

ryanjulian commented Oct 9, 2019 • edited

ahtsan commented Oct 9, 2019

lywong92 commented Oct 9, 2019

lywong92 commented Oct 10, 2019

ahtsan commented Oct 10, 2019

codecov bot commented Oct 2, 2019 •

edited

ryanjulian commented Oct 2, 2019 •

edited

ahtsan commented Oct 2, 2019 •

edited

ahtsan commented Oct 7, 2019 •

edited

ryanjulian commented Oct 9, 2019 •

edited