Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A3C_continuous_action - reward normalisation #2

Closed
Anjum48 opened this issue Jun 3, 2017 · 1 comment
Closed

A3C_continuous_action - reward normalisation #2

Anjum48 opened this issue Jun 3, 2017 · 1 comment

Comments

@Anjum48
Copy link

Anjum48 commented Jun 3, 2017

Hi - thank you very much for a great set of tutorials.

I noticed this line where the reward is rescaled:
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/10_A3C/A3C_continuous_action.py#L134

Removing this line seems to break the Pendulum example, so therefore it is clearly sensitive to this parameter. I understand why this was done, but are you aware of any other way of dealing with this issue to make the algorithm more general?

I came across this which sounds interesting https://arxiv.org/pdf/1602.07714.pdf

@MorvanZhou
Copy link
Owner

Hi @Anjum48 ,
In my personal experience, the normalized reward does matter for a general RL setup. I also heard about the paper you linked at the bottom. It generalize the reward effect but I haven't tried this algorithm yet.
The issue wrt the reward range is the issue of neural nets themselves. A neural net would have difficulty in learning a larger range of predicted value. I don't have a better solution at the moment. For me, the best way to address this issue seems just follow that paper. haha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants