Skip to content

Latest commit

 

History

History
19 lines (17 loc) · 1.49 KB

File metadata and controls

19 lines (17 loc) · 1.49 KB

Deep Deterministic Policy Gradient (DDPG)

Deep Deterministic Policy Gradient (DDPG) is a reinforcement learning technique that combines both Q-learning and Policy gradients. DDPG being an actor-critic technique consists of two models: Actor and Critic. The actor is a policy network that takes the state as input and outputs the exact action (continuous), instead of a probability distribution over actions. The critic is a Q-value network that takes in state and action as input and outputs the Q-value. DDPG is an “off”-policy method. DDPG is used in the continuous action setting and the “deterministic” in DDPG refers to the fact that the actor computes the action directly instead of a probability distribution over actions. DDPG is used in a continuous action setting and is an improvement over the vanilla actor-critic.

Code

python sample_keras.py

Usefull Resources: