Questions about training A3C #67

Tord-Zhang · 2017-08-16T02:42:02Z

I am trying to use A3C to deal with the task of action detection.When training A3C,I found that the gradient of actor network is too large,mainly because
self.policyTarget[action] = -(R - V) / probability[action]
the "probability[action]"may be too small.After about 3 thousand times of training process(backprop),the network tend to do the same action?
May you give me some advice about how to fix this?
Thank you so much.

The text was updated successfully, but these errors were encountered:

Kaixhin · 2017-08-16T07:32:33Z

If you read the original A3C paper, you will see that they introduce entropy regularisation to encourage a more uniform distribution of actions with A3C (which can be increased from the default in this code using the -entropyBeta <value> option).

Note that RL problems are complex, and you will need to adjust many hyperparameters, such as the learning rate or the discount factor (gamma), to fit your problem. In many DeepMind papers they mention doing hundreds of experiments to find the appropriate set of hyperparameters, so unless you are trying to replicate results from a paper with known hyperparameters, expect that you might have to do the same.

Kaixhin closed this as completed Aug 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training A3C #67

Questions about training A3C #67

Tord-Zhang commented Aug 16, 2017

Kaixhin commented Aug 16, 2017

Questions about training A3C #67

Questions about training A3C #67

Comments

Tord-Zhang commented Aug 16, 2017

Kaixhin commented Aug 16, 2017