Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why clip Rewards? #1

Closed
MatheusMRFM opened this issue Sep 22, 2017 · 2 comments
Closed

Why clip Rewards? #1

MatheusMRFM opened this issue Sep 22, 2017 · 2 comments

Comments

@MatheusMRFM
Copy link

I was wondering why clipping the rewards improves the performance.....the rewards for the Breakout environment (using OpenAI gym) is already limited between [-1, 1]. Could it be that the performance difference is due to the gradient normalization only?

I also noticed that you use tf.clip_by_average_norm instead of tf.clip_by_global_norm. Have you tried the latter? It's just that in other A3C implementations that I have seen, it is far more common to use the latter, and that made me wounder if there is any specific to use clip_by_average_norm.

Anyways, congratulations! Great work!

@hiwonjoon
Copy link
Owner

Hi,

  1. Actually, a reward is not clipped by the environment. If you watch the play closely, the blocks on the top row give a higher score. The main reason clipping a reward is that neural network is not good at fitting to data not having zero mean. The bias should handle this (by setting higher learning rate?), but it would be better just normalize it. I see many cases where just scaling a reward is the key to a training.

  2. I haven't tried the other version, but I guess it won't make a big difference. I just chose it because that was the function I found from the tensorflow documentation.

Thanks!

@MatheusMRFM
Copy link
Author

Oh my......I didn't know that the blocks on the top give higher scores! That is truly a game changer for me! I've been having a LOT of trouble with Breakout using my code, and I have already compared it with about 7 or 8 other implementations of the A3C. But somehow I missed the reward clipping in all of them.....I already heard that clipping rewards is important, but I really thought that the Gym already made that in the backstage.

I will try this in my code and see what happens.

I am truly grateful for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants