Stochastic Policy Gradient Methods

We design and test 3 policy gradient methods in this repository

Monte Carlo Policy Gradient : Baseline used is average of rewards obtained, no baseline results in high variance
Actor Critic Method : Using Softmax policy and Q-learning Critic for value function estimation
Numerical Gradient Estimation : perturb the parameters and estimate the gradient using regression (X'X)^-1X'y. Change num_rollouts to change the number of training examples we learn the gradient from. Note that the actual number of runs is number of episodes * num_rollouts

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Actor Critic Vanilla Gradient Softmax		Actor Critic Vanilla Gradient Softmax
MCGP		MCGP
Numerical Gradient Estimation		Numerical Gradient Estimation
LICENSE		LICENSE
README.md		README.md
cartpole-actorcritic.png		cartpole-actorcritic.png

Provide feedback