New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace CategoricalConvPolicy #906
Conversation
Codecov Report
@@ Coverage Diff @@
## master #906 +/- ##
=========================================
- Coverage 69.67% 69.28% -0.4%
=========================================
Files 171 170 -1
Lines 9485 9434 -51
Branches 1250 1249 -1
=========================================
- Hits 6609 6536 -73
- Misses 2660 2681 +21
- Partials 216 217 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for adding an example for CategoricalCNNPolicy. Please replace 'conv' with 'CNN' as it's the preferred convention before merging. Well done!
examples/tf/trpo_cubecrash.py
Outdated
"""Run task.""" | ||
with LocalTFRunner(snapshot_config=snapshot_config) as runner: | ||
env = TfEnv(normalize(gym.make('CubeCrash-v0'))) | ||
policy = CategoricalConvPolicy(env_spec=env.spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to use CNN
for convention, so it would be CategoricalCNNPolicy
. Please also edit other occurrence.
Can you run benchmarks using the following environments:
^ - Using the wrappers PixelObservationWrapper and FrameStack (n=4) I omitted the Sparse and ScreenBecomesBlack variants of CubeCrash because those are meant to test off-policy algorithms and RNNs respectively. First list found using this script: > import gym
> from gym.spaces import Box
> targets = [s for s in gym.envs.registry.all() if s._entry_point != 'gym.envs.atari:AtariEnv' and type(s.make().observation_space) == Box and len(s.make().observation_space.shape) == 3]
> print(targets)
[EnvSpec(CubeCrashScreenBecomesBlack-v0),
EnvSpec(CubeCrash-v0),
EnvSpec(CarRacing-v0),
EnvSpec(MemorizeDigits-v0),
EnvSpec(CubeCrashSparse-v0)] Second list using this additional query: > from gym.spaces import Discrete
> targets = [s for s in gym.envs.registry.all() if s._entry_point != 'gym.envs.atari:AtariEnv' and type(s.make().action_space) == Discrete and 'rgb _array' in s.make().metadata['render.modes']]
> print(targets)
[EnvSpec(MountainCar-v0),
EnvSpec(CubeCrashScreenBecomesBlack-v0),
EnvSpec(CartPole-v1),
EnvSpec(CartPole-v0),
EnvSpec(CubeCrash-v0),
EnvSpec(Acrobot-v1),
EnvSpec(LunarLander-v2),
EnvSpec(MemorizeDigits-v0),
EnvSpec(CubeCrashSparse-v0)]
|
|
@ahtsan gym documentation mentions that it can easily be controlled as discrete using 100% on/off actions. |
I think even a small number of the Atari environments should be enough? It would be very surprising if we saw a large regression on some but not most of them. |
@krzentner the issue is that PPO/on-policy algos can't train Atari in any reasonable number of steps, because of exploration (and this primitive is only applicable to on-policy algos). |
Ah, I see, my bad. The PPO paper does use Atari as a benchmark, but takes ~10M timesteps to train even the easiest envs. The list you've given above is a good idea then (perhaps minus CarRacing, if that will take extra work). |
@lywong92 did you have a chance to run the additional benchmarks? |
Extra effort is needed for getting discrete action for CarRacing environment. |
okay -- how much extra effort? if it's a lot, we can just test with the smaller set. |
If we had a |
After some discussion, implementing that wrapper correctly in a way that would interoperate with our policies seems difficult, so let's just skip |
I ran CubeCrash-v0: I had some trouble using the pixel wrapper in gym. I think that's solved but now I'm getting an error saying that the env can't be pickled. I wrapped the Acrobot env with PixelObservationWrapper, Grayscale, then StackFrames. Do you happen to know if there's a way to resolve this? |
Upodate: the |
re: the viewer -- this can be handled in GarageEnv. we already have special cases for closing the viewers (because gym doesn't do it properly) |
when you post your results, can you use the confidence interval plots introduced by Yong, Utkarsh, and Anson? They make results much easier to interpret. |
The issue we are having is that the viewer is not pickleable and raise error when |
@ahtsan i'm suggesting you can handle this in a similar way to how he handle closing environments. |
@lywong92 what you done should be enough, thought 100 iterations is a really low number. resizing should make it go faster and not really affect performance. i wouldn't expect to see signs of life until ~1000 or more (assuming the batch size is 4000). perhaps try running this with PPO rather than TRPO, which might be kinder to the CNNs. let's not get hung up trying to get the Pixel wrappers to work right now. the goal is to ensure that this primitive isn't any worse than before. since it looks like CubeCrash and MemorizeDigits are the environments which work today, and they look good, i think this is ready to merge. can you make sure to add your benchmark script before merging? |
Okay, will do! |
7c20abf
to
298f61e
Compare
Sorry about the lag. I rebased and am seeing a possible bug when I run examples. Currently working on it and will merge as soon as I get that figured out. |
This is the bug -- #794 |
- Remove all occurrences of CategoricalConvPolicy - Rename CategoricalConvPolicyWithModel to CategoricalConvPolicy - Create and remove integration test
298f61e
to
be4c045
Compare
Benchmark script is located in origin/benchmark_categorical_cnn_policy.
Results:
Also tried running both versions in an atari environment PongNoFrameskip with PPO, but realized that this combination of environment and algorithm is not ideal for our testing:
As discussed, results from MemorizeDigits are sufficient to show that the layer implementation can be replaced with the model implementation.