A3C_continuous_action - reward normalisation #2

Anjum48 · 2017-06-03T21:43:12Z

Hi - thank you very much for a great set of tutorials.

I noticed this line where the reward is rescaled:
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/10_A3C/A3C_continuous_action.py#L134

Removing this line seems to break the Pendulum example, so therefore it is clearly sensitive to this parameter. I understand why this was done, but are you aware of any other way of dealing with this issue to make the algorithm more general?

I came across this which sounds interesting https://arxiv.org/pdf/1602.07714.pdf

MorvanZhou · 2017-06-05T13:29:32Z

Hi @Anjum48 ,
In my personal experience, the normalized reward does matter for a general RL setup. I also heard about the paper you linked at the bottom. It generalize the reward effect but I haven't tried this algorithm yet.
The issue wrt the reward range is the issue of neural nets themselves. A neural net would have difficulty in learning a larger range of predicted value. I don't have a better solution at the moment. For me, the best way to address this issue seems just follow that paper. haha.

MorvanZhou closed this as completed Jun 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A3C_continuous_action - reward normalisation #2

A3C_continuous_action - reward normalisation #2

Anjum48 commented Jun 3, 2017

MorvanZhou commented Jun 5, 2017

A3C_continuous_action - reward normalisation #2

A3C_continuous_action - reward normalisation #2

Comments

Anjum48 commented Jun 3, 2017

MorvanZhou commented Jun 5, 2017