RL agents using various reinforcement learning algorithms. Test mainly on OpenAI gym environments. Currently, both discrete and continuous action space versions are working perfectly. Continuous versions can solve Pendulum in around 1000 episodes.
Dependencies: OpenAI Gym, PyTorch
- Advantage Actor Critic (A2C)
- Discrete action space version
- a2c.py
- a3c.py
- Continuous action space version
- a2c_continuous.py
- A3C Paper
- Proximal Policy Optimization
- Discrete action space version
- ppo.py
- Continuous action space version
- ppo_continuous.py
- PPO Paper
- Deep Deterministic Policy Gradient
- Using replay memory and ornstein uhlenbeck noise
- ddpg.py
- DDPG Paper
- Deep Q Learning and Double Q Learning
- Using replay memory and asynchronous update
- dqn.py
- ddqn.py
- DQN Paper
- DDQN Paper
- Policy Gradient
- Discrete action space version
- pg.py
- Continuous action space version
- pg_continuous.py
- PG Blog