Skip to content
This repository has been archived by the owner on Jan 27, 2023. It is now read-only.

Implement recurrent policy #8

Merged
merged 20 commits into from
Apr 9, 2019
Merged

Implement recurrent policy #8

merged 20 commits into from
Apr 9, 2019

Conversation

kngwyu
Copy link
Owner

@kngwyu kngwyu commented Apr 5, 2019

What's done:

  • Implement GRU Policy
  • Check a2c_cart_poly and ppo_cart_pole works

TODO:

  • Implement LSTM Policy
  • Check a2c_atari and ppo_atari works
    • I failed to make a2c_atari work, as also reported in other repo
  • Check it works for HalfCheetah
  • Check it works for flickering Atari

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 9, 2019

image

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 9, 2019

image
Wow it really overfits.
Note
This is the result of HalfCheetah-BulletEnv and not compatible with the result in the one on mujoco.

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 9, 2019

image

@kngwyu kngwyu merged commit 0718288 into master Apr 9, 2019
@kngwyu
Copy link
Owner Author

kngwyu commented Apr 9, 2019

I noticed lr_cooler and clip_cooler doesn't work correctly, which possibly decreases scores in ppo_atari.
I'm going to bench it again after fixing it.

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 11, 2019

image

image

Looks decreasing clip_eps is really infulential for RNNs.

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 11, 2019

And lr_decay looks also infulential, especially for CNN
image

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 11, 2019

image

The result of flicker breakout.
The difference from the above picture is decreasing ppo_clip or not(Though in the above picture, ppo_clip is decreased only (0.1 / max_steps) * (max_steps // (8 * 128)

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 12, 2019

LSTM results here use incorrect bias initialization.
I'll upload new results using #17 after the experiments get over.

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 12, 2019

I found an another bug(#18).
So I think all above results are not reliable, though they looks working correctly....

@kngwyu
Copy link
Owner Author

kngwyu commented Apr 13, 2019

I uploaded some results with bug fix in #18

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant