Skip to content

koryakinp/mldriver-discrete-steering

Repository files navigation

Implementation of Advantage Actor-Critic Algorithm (A2C) with visual observations and discrete action space

Environment

Custom build MLDriver Unity Environment

Links

Environment Parameters:

  • Observation Space: [64, 64, 1]
  • Action Space: [3]

Agent

Neural Network Architecture

  • Input Tensor with Dimensions [64,64,5]
  • Convolutional Layer with 32 kernels of size [8,8], strides [4,4] and ReLU activation
  • Convolutional Layer with 64 kernels of size [4,4], strides [2,2] and ReLU activation
  • Convolutional Layer with 64 kernels of size [3,3], strides [1,1] and ReLU activation
  • Fully-Connected Layer with 1024 neurons and ReLU activation
  • Fully-Connected Layer with 512 neurons and ReLU activation
  • Fully-Connected Layer with 256 neurons and ReLU activation
  • Policy Head (Actor) with 3 output neurons and Value Head (Critic) with 1 output neurons

Results

Smothed average episode reward vs number of training steps

Sample Run

Authors

Pavel Koryakin koryakinp@koryakinp.com

Acknowledgments