-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doubts #11
Comments
The trust region function should be a translation of the one from ChainerRL, apart from the fact that the trust region involves some parameters, so if you think it should be the other way around then you should raise an issue there to let them know too. |
I think since we have already defined the kl divergence before, we do not really need the negative sign. I am not sure why there is a negative sign. Please let me know your views. Do you think it should be the other way round? |
I'm not sure either, I just ported the code from Chainer for this part. |
Let me know the difference in your implementation and OpenAI baselines. |
I've not compared the two at all, and this is very low priority for me at the moment. |
@Kaixhin, thank you for your input. I will look into the chainerrl and OpenAI baselines to get more insight about the implementation. I will just send a PR if needed. Thanks |
I am closing this issue because the current implementation seems to be correct. However, I think we need to detach the z_star_p for better results. Please refer to #13 . |
Could you please let me know why there is a negative sign. I think that since we have already defined kl-divergence in the step before, we do not need a negative sign here. Please let me know how do you see this.
ACER/train.py
Line 81 in f22b07c
The text was updated successfully, but these errors were encountered: