Agent is using DDPG algorithm to predict continuous actions in continuous state space. It has two networks: Actor and Critic.
https://towardsdatascience.com/reinforcement-learning-w-keras-openai-actor-critic-models-f084612cfd69
https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404
https://spinningup.openai.com/en/latest/algorithms/ddpg.html
Actor topology
Critic topology
The Actor network has 2 inputs from game: position, velocity. The output layer consists from fully-connected 'tanh()' layer for doing actions in range (-1.0, 1.0): force. Hidden layers are using ReLU function.
The Critic network has 2 inputs from game (states) and 1 input from Actor network (action). Hidden layers are using ReLU function. The main function of this network is estimate quality of the action[t] in the state[t].
The Critic network is trained by Bellman equation:
Q_target = reward + (1-done) * gamma * Q_next_state
Q_target -> Q value to be trained,
reward -> reward from game for action in state,
gamma -> discount factor,
Q_next_state -> quality of action in next state
done -> 1, if it's terminal state or 0 in non-terminal state
Summary
Framework: Tensorflow 2.0
Languages: Python 3
Author: Martin Kubovcik