DQN (and DDQN) REINFORCE with baseline PPO TRPO
I wanted to start with the easier/basic methods (as described in Reinforcement Learning: an Introduction). One issue I've run into, although I'm probably missing something, is that (at least for value-functions) full representations of the state space are required. It's not 100% clear to me how to represent that simply while taking advantage of the provided environments in openai/gym. My plan for now is to start with the above three methods, and then I'll write some specific environments and representations for the other methods.
- Rollout
- Monte-Carlo Tree Search
- n-step SARSA on-policy (probably start with 1-step)
- n-step SARSA off-policy
- n-step Tree Backup
- Q-learning
- n-step Q(sigma)
- Gradient Monte-Carlo
- Semi-Gradient TD(0)
- n-step semi-gradient SARSA
- n-step differential semi-gradient SARSA
- DQN (paper: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) -- DDQN (paper: https://arxiv.org/pdf/1509.06461.pdf)
- GTD(0)
- Semi-Gradient TD(lambda)
- REINFORCE
- REINFOCE with Baseline
- PPO (paper: https://arxiv.org/abs/1707.06347)
- TRPO (paper: https://arxiv.org/abs/1502.05477)
- GoZero (paper: https://www.nature.com/articles/nature24270)