Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about training A3C #67

Closed
Tord-Zhang opened this issue Aug 16, 2017 · 1 comment
Closed

Questions about training A3C #67

Tord-Zhang opened this issue Aug 16, 2017 · 1 comment

Comments

@Tord-Zhang
Copy link

I am trying to use A3C to deal with the task of action detection.When training A3C,I found that the gradient of actor network is too large,mainly because
self.policyTarget[action] = -(R - V) / probability[action]
the "probability[action]"may be too small.After about 3 thousand times of training process(backprop),the network tend to do the same action?
May you give me some advice about how to fix this?
Thank you so much.

@Kaixhin
Copy link
Owner

Kaixhin commented Aug 16, 2017

If you read the original A3C paper, you will see that they introduce entropy regularisation to encourage a more uniform distribution of actions with A3C (which can be increased from the default in this code using the -entropyBeta <value> option).

Note that RL problems are complex, and you will need to adjust many hyperparameters, such as the learning rate or the discount factor (gamma), to fit your problem. In many DeepMind papers they mention doing hundreds of experiments to find the appropriate set of hyperparameters, so unless you are trying to replicate results from a paper with known hyperparameters, expect that you might have to do the same.

@Kaixhin Kaixhin closed this as completed Aug 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants