We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward * 0.9 + score * 0.1'
The text was updated successfully, but these errors were encountered:
I have changed the activate function from relu to tanh, but there is nothing improvement.
Sorry, something went wrong.
我也遇到这个问题,我咨询elegantrl作者,他说先tahn,再通过torch.distribution来sample action会影响信息熵,所以是没有办法收敛的,但是我不喜欢elegantrl的ppo写法,所以我还在找别人的代码
Have you got the right code yet? Could you copy a link? Very appreciate!!
No branches or pull requests
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward * 0.9 + score * 0.1'
The text was updated successfully, but these errors were encountered: