Proximal Policy Optimization An implementation of PPO (clipping and KL-divergence) using Tensorflow. The results on some Mujoco tasks have been reproduced as in the PPO paper. Acknowledgements This repository is a blend of the Pytorch repository mjrl and OpenAI baselines.