You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Removing this line seems to break the Pendulum example, so therefore it is clearly sensitive to this parameter. I understand why this was done, but are you aware of any other way of dealing with this issue to make the algorithm more general?
Hi @Anjum48 ,
In my personal experience, the normalized reward does matter for a general RL setup. I also heard about the paper you linked at the bottom. It generalize the reward effect but I haven't tried this algorithm yet.
The issue wrt the reward range is the issue of neural nets themselves. A neural net would have difficulty in learning a larger range of predicted value. I don't have a better solution at the moment. For me, the best way to address this issue seems just follow that paper. haha.
Hi - thank you very much for a great set of tutorials.
I noticed this line where the reward is rescaled:
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/10_A3C/A3C_continuous_action.py#L134
Removing this line seems to break the Pendulum example, so therefore it is clearly sensitive to this parameter. I understand why this was done, but are you aware of any other way of dealing with this issue to make the algorithm more general?
I came across this which sounds interesting https://arxiv.org/pdf/1602.07714.pdf
The text was updated successfully, but these errors were encountered: