Problem with more than one action - A3C #55

akhilsanand · 2018-06-09T14:15:37Z

Hello,

Thank you very much for the A3C implementation. I was trying implement your A3C implementation on biped walker to a custom muscle model and got it working for a single action output. But when I tried to implement it with more than one action, it get stuck in the smallest reward local minima. I tried starting the training with high learning rate and also tried different entropy_beta value to encourage more exploration. But alas nothing helped. could u help me with some advise on this.

Irrespective of whatever I tried the training get stuck in the local minima with a smallest reward possible.

regards.
akhil

akhilsanand · 2018-06-12T15:54:26Z

hello,
interestingly the action bound in your code affects the learning very much. In my custom environment there are only 2 actions required and they are within a range of [0,1]. But when I try with [-1, 1] for action bound in your code it doesn't get stuck in local minima anymore unlike with [0,1]. Could you please explain this phenomena.

Regards,
Akhil

MorvanZhou · 2018-06-13T01:39:27Z

Hi Akhil,

It may be related to the activation function you selected. For example, if mapping action to (-1, 1), you will choose tanh as the mapping function and sigmoid for (0, 1), these two mappings have different derivative which may affect your training.

akhilsanand · 2018-06-13T08:02:30Z

hello Zhou,

Thank you very much for the reply, I have tried the sigmoid activation function with [0,1] action bound, but it still get stuck in the local minima. But with sigmoid act. function and [-1,1] as action bound it again starts learning really well. Do u have some idea about it?

regards,
akhil

MorvanZhou · 2018-06-13T08:05:47Z

Then I think it is likely that the backprop with tanh is better than sigmoid. This might be one of the reasons.

akhilsanand · 2018-06-13T08:51:28Z

hello Zhou,

But I am getting good results with sigmoid act. function and action bound of [-1, 1]. This makes me confused on how the action bound is really affecting even after using a sigmoid activation function.

MorvanZhou · 2018-06-13T09:08:35Z

The calculation of action bound is tf.clip_by_value(output, lower_bound, upper_bound), Due to the action is normal-distributed, sometimes the output can still be less than 0 when using sigmoid mean. Therefore, taking action bound of (-1,1) still affect the final result.

akhilsanand · 2018-06-13T15:55:52Z

thanks zhou

MorvanZhou closed this as completed Jun 13, 2018

MorvanZhou reopened this Jun 13, 2018

MorvanZhou closed this as completed Jun 13, 2018

MorvanZhou reopened this Jun 13, 2018

MorvanZhou closed this as completed Jun 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with more than one action - A3C #55

Problem with more than one action - A3C #55

akhilsanand commented Jun 9, 2018

akhilsanand commented Jun 12, 2018

MorvanZhou commented Jun 13, 2018

akhilsanand commented Jun 13, 2018

MorvanZhou commented Jun 13, 2018

akhilsanand commented Jun 13, 2018

MorvanZhou commented Jun 13, 2018

akhilsanand commented Jun 13, 2018

Problem with more than one action - A3C #55

Problem with more than one action - A3C #55

Comments

akhilsanand commented Jun 9, 2018

akhilsanand commented Jun 12, 2018

MorvanZhou commented Jun 13, 2018

akhilsanand commented Jun 13, 2018

MorvanZhou commented Jun 13, 2018

akhilsanand commented Jun 13, 2018

MorvanZhou commented Jun 13, 2018

akhilsanand commented Jun 13, 2018