You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use A3C to deal with the task of action detection.When training A3C,I found that the gradient of actor network is too large,mainly because self.policyTarget[action] = -(R - V) / probability[action]
the "probability[action]"may be too small.After about 3 thousand times of training process(backprop),the network tend to do the same action?
May you give me some advice about how to fix this?
Thank you so much.
The text was updated successfully, but these errors were encountered:
If you read the original A3C paper, you will see that they introduce entropy regularisation to encourage a more uniform distribution of actions with A3C (which can be increased from the default in this code using the -entropyBeta <value> option).
Note that RL problems are complex, and you will need to adjust many hyperparameters, such as the learning rate or the discount factor (gamma), to fit your problem. In many DeepMind papers they mention doing hundreds of experiments to find the appropriate set of hyperparameters, so unless you are trying to replicate results from a paper with known hyperparameters, expect that you might have to do the same.
I am trying to use A3C to deal with the task of action detection.When training A3C,I found that the gradient of actor network is too large,mainly because
self.policyTarget[action] = -(R - V) / probability[action]
the "probability[action]"may be too small.After about 3 thousand times of training process(backprop),the network tend to do the same action?
May you give me some advice about how to fix this?
Thank you so much.
The text was updated successfully, but these errors were encountered: