Gym env #12

ShuvenduRoy · 2018-05-30T16:53:30Z

Why necessarily 'NoFrameskip'? And what is the specification of 'NoFrameskip'?

Line 214 in 0f82b69

assert 'NoFrameskip' in env.spec.id

ShuvenduRoy · 2018-05-31T12:25:56Z

I tried 'Pong-V0' and it did not quite do well. Max reward I got was -2. What might have caused the problem?

garkavem · 2018-05-31T15:07:56Z

These wrappers are from openai baselines project. Frameskip is traditionally used in atari games to speed up learning. You simply do not need to see all the frames - every 4th will do. But previously frameskip was build in environments - all gym environments that do not have "NoFrameskip" in the name already give you only every 4th frame. For some reasons in baselines project they decide to take environments without frameskip and feed it to wrapper MaxAndSkipEnv.
So, if you remove line
assert 'NoFrameskip' in env.spec.id
and feed Pong-V0 to MaxAndSkipEnv wrapper you will get only every 16th frame. Probably that's not enough to play pong.

ShuvenduRoy · 2018-05-31T16:34:00Z

As this environment is not currently available in gym, what should I change to reduce this 16 frame gap to 4

garkavem · 2018-05-31T16:44:02Z

Just remove
env = MaxAndSkipEnv(env, skip=4)
from "make_atari" in wrappers

ShuvenduRoy · 2018-05-31T19:59:26Z

Ok, that's good

But I am wondering about the shape of the state. According to the original paper

The details of the architecture are explained in the Methods. 
The input to the neural network consists of an 84 x 84 x 4 image produced by the preprocessing map w, followed by three convolutional layers

But I checked the shape of the state. which is (1, 84, 84). Where this preprocessing is deviating from the original paper

garkavem · 2018-05-31T21:52:42Z

Call wrap_deepmind with argument frame_stack=True. This problem was raised in #9

ShuvenduRoy · 2018-05-31T22:11:06Z

Looks like something mysterious is happening. I also wonder how this even worked without sequence information. Any idea what is going on here?

Are we Bruteforcing the model to learn from current pixel only, which might not work in more complex case?

garkavem · 2018-05-31T22:31:35Z

Well Pong is just an extremely simple game. I suspect that someone with perfect reaction(like RL-agen) can just always move towards ball and that would be sufficient to win. For many atari games stacking frames is necessary though.

ShuvenduRoy · 2018-05-31T22:40:17Z

ok!!!

I am not quite getting the logic from code, where from this 4 frames are coming. As this env skips 4 frames, what is the situation now? Are we having 16 frame information in (4, 80, 80) size state? Or the skipping 4 frame is now inserted in this?

garkavem · 2018-05-31T22:43:06Z

You have information about the current frame, 4 frames ago, 8 frames ago and 16 frames ago. You don't have the skipped frames.

ShuvenduRoy · 2018-05-31T22:44:21Z

ow. I got it. Thanks :-)

ShuvenduRoy · 2018-06-01T15:14:22Z

With all this information with the modification, I trained the model. But could not quite regenerate the result as the original one. Here is the code and the result

Any explanation where it is causing problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gym env #12

Gym env #12

ShuvenduRoy commented May 30, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018 •

edited

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

ShuvenduRoy commented Jun 1, 2018

Gym env #12

Gym env #12

Comments

ShuvenduRoy commented May 30, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018 • edited

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

garkavem commented May 31, 2018

ShuvenduRoy commented May 31, 2018

ShuvenduRoy commented Jun 1, 2018

ShuvenduRoy commented May 31, 2018 •

edited