Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal policy #12

Open
xli4217 opened this issue Nov 15, 2018 · 1 comment
Open

Suboptimal policy #12

xli4217 opened this issue Nov 15, 2018 · 1 comment

Comments

@xli4217
Copy link

xli4217 commented Nov 15, 2018

I'm trying SQL on a simple manipulator reaching task, the agent quickly learns to get to the vicinity of the goal but then the learning curve plateaus and the agent never quite get to the goal. Some of my hyperparameters are

  • policy learning rate 0.0005
  • Q learning rate 0.001
  • reward scale 20
  • alpha 1.0

Is there something I can do to improve this? Thanks.

@haarnoja
Copy link
Owner

SQL learns maximum entropy policies, so that's why the optimal policy is stochastic. You can try for example annealing the temperature to zero, or shaping the reward function by making the reward much larger in the vicinity of the goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants