Tensorflow Implementation of SVPG
Implementation is on Tensorflow r1.3
https://arxiv.org/abs/1704.02399
"Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem." From Paper