Skip to content

nslyubaykin/relax_ddpg_example

Repository files navigation

Example DDPG implementation with ReLAx

This repository contains an implementation of deep deterministic policy gradient (DDPG) with ReLAx.

DDPG actor was trained on InvertedPendulum-v2 Mujoco Gym environment for 100k env-steps.

The graph of average return vs environment step is shown below (logs done every 10k steps):

ddpg_training

The distribution of estimated Q-values vs data Q-values is shown below:

ddpg_q_func

Resulting Policy:

ddpg_run.mp4