DDPG_TF2

It was hard to find a simple and tidy DDPG implementation in TF2, so I made one.

DDPG

DDPG is an model-free, off-policy algorithm that learns a Q-function and a policy in a continuous action space. It is inspired by Deep Q Learning, and can be seen as DQN on a continuous acion space. It employs the use of off-policy data and the Bellman equation to learn the Q function which is in turn used to derive and learn the policy. In this implementation of DDPG n pure exploration (specified by the rand_steps parameter) episodes are performed in the beginning. The actions are chosen via uniform distribution over the whole range.

Main features:

Stochastic (deep) model estimation allows for continuous (infinite) action spaces.
Use of a noise process (for example the Ornstein–Uhlenbeck process) for action space exploration.
Use of experience replay for a stable learning on previous experiences.
Actor and critic structure
Use of target models for both actor and critic networks (weight transfer with Polyak averaging).
Use of the Bellman equation to describe the optimal q-value function for each pair <state, action>.
Use of batch normalization in both actor and critic networks. This is a inconclusive practice, but was present in the original paper.

The DDPG algorith was originaly described this paper.

Performance on OpenAI gym environments

Pendulum-v0

The model usually needs about 70-80 iterations to reach a decent performance. This number may be decreased by further hyperparameter tuning.

Performance after 70 iterations:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
network		network
utils		utils
.gitignore		.gitignore
README.md		README.md
ddpg.py		ddpg.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network

network

utils

utils

.gitignore

.gitignore

README.md

README.md

ddpg.py

ddpg.py

requirements.txt

requirements.txt

test.py

test.py

Repository files navigation

DDPG_TF2

DDPG

Main features:

Performance on OpenAI gym environments

Pendulum-v0

About

Releases

Packages

Languages

gerkone/DDPG_TF2

Folders and files

Latest commit

History

Repository files navigation

DDPG_TF2

DDPG

Main features:

Performance on OpenAI gym environments

Pendulum-v0

About

Topics

Resources

Stars

Watchers

Forks

Languages