Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with more than one action - A3C #55

Closed
akhilsanand opened this issue Jun 9, 2018 · 7 comments
Closed

Problem with more than one action - A3C #55

akhilsanand opened this issue Jun 9, 2018 · 7 comments

Comments

@akhilsanand
Copy link

Hello,

Thank you very much for the A3C implementation. I was trying implement your A3C implementation on biped walker to a custom muscle model and got it working for a single action output. But when I tried to implement it with more than one action, it get stuck in the smallest reward local minima. I tried starting the training with high learning rate and also tried different entropy_beta value to encourage more exploration. But alas nothing helped. could u help me with some advise on this.

Irrespective of whatever I tried the training get stuck in the local minima with a smallest reward possible.

regards.
akhil

@akhilsanand
Copy link
Author

hello,
interestingly the action bound in your code affects the learning very much. In my custom environment there are only 2 actions required and they are within a range of [0,1]. But when I try with [-1, 1] for action bound in your code it doesn't get stuck in local minima anymore unlike with [0,1]. Could you please explain this phenomena.

Regards,
Akhil

@MorvanZhou
Copy link
Owner

Hi Akhil,

It may be related to the activation function you selected. For example, if mapping action to (-1, 1), you will choose tanh as the mapping function and sigmoid for (0, 1), these two mappings have different derivative which may affect your training.

@akhilsanand
Copy link
Author

hello Zhou,

Thank you very much for the reply, I have tried the sigmoid activation function with [0,1] action bound, but it still get stuck in the local minima. But with sigmoid act. function and [-1,1] as action bound it again starts learning really well. Do u have some idea about it?

regards,
akhil

@MorvanZhou
Copy link
Owner

Then I think it is likely that the backprop with tanh is better than sigmoid. This might be one of the reasons.

@akhilsanand
Copy link
Author

hello Zhou,

But I am getting good results with sigmoid act. function and action bound of [-1, 1]. This makes me confused on how the action bound is really affecting even after using a sigmoid activation function.

@MorvanZhou
Copy link
Owner

The calculation of action bound is tf.clip_by_value(output, lower_bound, upper_bound), Due to the action is normal-distributed, sometimes the output can still be less than 0 when using sigmoid mean. Therefore, taking action bound of (-1,1) still affect the final result.

@akhilsanand
Copy link
Author

thanks zhou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants