Skip to content

Tensorflow implementation of the asynchronous advantage actor-critic (a3c) reinforcement learning algorithm for continuous action space

License

Notifications You must be signed in to change notification settings

mendezVKI/A3C-Continuous

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A3C Continuous Reinforcement Learning

Tensorflow implementation of the asynchronous advantage actor-critic (A3C) reinforcement learning algorithm (paper) for continuous action space. Code is mostly based on Morvan Zhou (github).

Components

  • ACNet: This class contains the actor-critic neural network that estimates an action given a certain state and a value for each state. For continuous action states the action is given as an expected value mu and variance sigma.
  • Worker: The A3C algorithm employs multiple workers which have their own environment and ACNet and train on these asynchronous. Every few steps they update their weights to the global ACNet.
  • Main: The main function creates the global ACNet and multiple workers. They start training until a defined number of training episodes is reached. Reward will be plotted over all steps.

Results

Pendulum environment before training:

before

After 1500 episodes:

after

About

Tensorflow implementation of the asynchronous advantage actor-critic (a3c) reinforcement learning algorithm for continuous action space

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%