You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering why clipping the rewards improves the performance.....the rewards for the Breakout environment (using OpenAI gym) is already limited between [-1, 1]. Could it be that the performance difference is due to the gradient normalization only?
I also noticed that you use tf.clip_by_average_norm instead of tf.clip_by_global_norm. Have you tried the latter? It's just that in other A3C implementations that I have seen, it is far more common to use the latter, and that made me wounder if there is any specific to use clip_by_average_norm.
Anyways, congratulations! Great work!
The text was updated successfully, but these errors were encountered:
Actually, a reward is not clipped by the environment. If you watch the play closely, the blocks on the top row give a higher score. The main reason clipping a reward is that neural network is not good at fitting to data not having zero mean. The bias should handle this (by setting higher learning rate?), but it would be better just normalize it. I see many cases where just scaling a reward is the key to a training.
I haven't tried the other version, but I guess it won't make a big difference. I just chose it because that was the function I found from the tensorflow documentation.
Oh my......I didn't know that the blocks on the top give higher scores! That is truly a game changer for me! I've been having a LOT of trouble with Breakout using my code, and I have already compared it with about 7 or 8 other implementations of the A3C. But somehow I missed the reward clipping in all of them.....I already heard that clipping rewards is important, but I really thought that the Gym already made that in the backstage.
I was wondering why clipping the rewards improves the performance.....the rewards for the Breakout environment (using OpenAI gym) is already limited between [-1, 1]. Could it be that the performance difference is due to the gradient normalization only?
I also noticed that you use tf.clip_by_average_norm instead of tf.clip_by_global_norm. Have you tried the latter? It's just that in other A3C implementations that I have seen, it is far more common to use the latter, and that made me wounder if there is any specific to use clip_by_average_norm.
Anyways, congratulations! Great work!
The text was updated successfully, but these errors were encountered: