-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try replacing Distributions with tf.Distributions #52
Comments
To add some details: Distributions are used by policies and other modules to add distribution functionality, such as computing the KL divergence between two distributions etc. given the parameters of a distribution (which are often output tensors of an NN). TF.Distributions probably implements the same thing so that we should try to replace our code by using the TF counterpart. |
If possible, remove rllab Distributions entirely. |
|
Sampled KL-divergenceIt is intractable to evaluate KL divergence directly on our policies, so we sample KL divergence. This leads to the complicated structure in the KL parts of BatchPolopt. The policy Of course, because of how TensorFlow works, Later in BatchPolopt, during the optimization, we would like to calculate the KL divergence between the old policy Finally, equipped with |
Hi @ryanjulian I write some test code to verify my idea but find some problems.
|
@ryanjulian It is runnable but still test code. And I still keep the legacy DiagonalGaussian because the sampler will use the distribution to calc entropy. So it may need a copy of sampler in tensorflow sandbox? If you think this solution is ok, I will replace the remaining. |
@silrep2 can you show me what you mean with a pull request? |
No description provided.
The text was updated successfully, but these errors were encountered: