Skip to content

sritee/Stochastic-Policy-Gradient-Methods

Repository files navigation

Stochastic Policy Gradient Methods

For a detailed discussion, visit : https://sridhartee.blogspot.in/2016/11/policy-gradient-methods.html

cartpole-actorcritic

We design and test 3 policy gradient methods in this repository

  1. Monte Carlo Policy Gradient : Baseline used is average of rewards obtained, no baseline results in high variance

  2. Actor Critic Method : Using Softmax policy and Q-learning Critic for value function estimation

  3. Numerical Gradient Estimation : perturb the parameters and estimate the gradient using regression (X'X)^-1X'y. Change num_rollouts to change the number of training examples we learn the gradient from. Note that the actual number of runs is number of episodes * num_rollouts

About

Monte-Carlo Policy Gradient, Stochastic Policy Gradient and Numerical Gradient Policy Gradient

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages