Deep Deterministic Policy Gradient (DDPG) algorithm is a widely used algorithm in reinforcement learning. It is a member of Actor Critic learning. It delivers a more stable training for temporal difference estimation approach in non-episodic setting. In this repository, it is implemented for a continuous control problem, in the OpenAI-gym LunarLanderContinuous-v2 environment.
An optimal policy is found after playing few hundreds of episodes. See the rewards, critic loss and actor loss obtained below:
To be added