Skip to content

Deep Deterministic Policy Gradients

License

Notifications You must be signed in to change notification settings

markub3327/DDPG-TF

Repository files navigation

Deep Deterministic Policy Gradient (DDPG)

Release Tag Issues Commits Languages Size

Theory

Agent is using DDPG algorithm to predict continuous actions in continuous state space. It has two networks: Actor and Critic.

https://towardsdatascience.com/reinforcement-learning-w-keras-openai-actor-critic-models-f084612cfd69

https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404

https://spinningup.openai.com/en/latest/algorithms/ddpg.html

Actor topology

Actor

Critic topology

Critic

Inputs/Outputs

 The Actor network has 2 inputs from game: position, velocity. The output layer consists from fully-connected 'tanh()' layer for doing actions in range (-1.0, 1.0): force. Hidden layers are using ReLU function.

 The Critic network has 2 inputs from game (states) and 1 input from Actor network (action). Hidden layers are using ReLU function. The main function of this network is estimate quality of the action[t] in the state[t].

The Critic network is trained by Bellman equation:

Q_target = reward + (1-done) * gamma * Q_next_state

Q_target       ->  Q value to be trained,
reward         ->  reward from game for action in state,
gamma          ->  discount factor,
Q_next_state   ->  quality of action in next state 
done           ->  1, if it's terminal state or 0 in non-terminal state

Summary

Critic

Framework: Tensorflow 2.0
Languages: Python 3
Author: Martin Kubovcik