Implement VAIL #14

keiohta · 2019-05-18T16:09:23Z

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

keiohta · 2019-07-07T00:58:29Z

Experiment on Pendulum-v0

# Generate expert trajectories
$ python examples/run_sac.py --env-name=Pendulum-v0 --save-test-path --test-interval=100000 --max-steps 100000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1
# VAIL with Spectral Normalization
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1 --enable-sn --dir-suffix SN

Results

Score
- The maximum score is 0.
- VAIL is unstable. Adding Spectral Normalization stabilizes learning and improves score.

DDPG loss

VAIL

keiohta · 2019-07-07T22:13:43Z

Experiment on HalfCheetah-v2

# Generate expert trajectories
$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=500000 --max-steps 500000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# VAIL_SN
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

# GAIL
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# GAIL_SN
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

Results

All algorithm reproduced the expert score (~9000)
- Note: SAC can achieve around 12000, but cut training with 0.5M steps to save time.
No big difference between GAIL, VAIL even utilizing Spectral Normalization
- Gray: VAIL, Orange: VAIL + SN, Red: GAIL, Blue: GAIL + SN, Green: Expert (SAC)

TensorBoard output

Score

- VAIL

- GAIL

- DDPG loss

keiohta · 2019-07-07T22:16:47Z

4535cfb

sasayesh · 2021-07-25T14:02:35Z

Hi @keiohta
I couldn't reproduce the same results, I got this error for HalfCheetah:
Traceback (most recent call last):
File "examples/run_vail_ddpg.py", line 44, in
trainer()
File "/home/ss/.local/lib/python3.8/site-packages/tf2rl/experiments/irl_trainer.py", line 51, in call
next_obs, reward, done, _ = self._env.step(action) ##
File "/home/ss/.local/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/half_cheetah.py", line 12, in step
self.do_simulation(action, self.frame_skip)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py", line 125, in do_simulation
self.sim.step()
File "mujoco_py/mjsim.pyx", line 126, in mujoco_py.cymj.MjSim.step
File "mujoco_py/cymj.pyx", line 156, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "mujoco_py/cymj.pyx", line 77, in mujoco_py.cymj.c_warning_callback
File "/home/ss/projects/mujoco-py/mujoco_py/builder.py", line 363, in user_warning_raise_exception
raise MujocoException(warn + 'Check for NaN in simulation.')
mujoco_py.builder.MujocoException: Unknown warning type Time = 24.7000.Check for NaN in simulation.
OS: Ubuntu 20
TF version: tested on 2.3 and 2.4
tf2rl: Master
Your input would be appreciated.

keiohta · 2021-07-25T14:06:01Z

Hi @sasayesh , thanks for reporting the error. I'll try to reproduce the bug on Wednesday.

keiohta added this to the IRL milestone May 24, 2019

keiohta closed this as completed Jul 7, 2019

keiohta mentioned this issue Jul 7, 2019

Implement GAIL #13

Closed

keiohta mentioned this issue Jul 29, 2020

Reward nan #92

Closed

keiohta mentioned this issue Aug 12, 2020

dose the code support MPI to speed up the training? #98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement VAIL #14

Implement VAIL #14

keiohta commented May 18, 2019

keiohta commented Jul 7, 2019

keiohta commented Jul 7, 2019 •

edited

Loading

keiohta commented Jul 7, 2019

sasayesh commented Jul 25, 2021

keiohta commented Jul 25, 2021

Implement VAIL #14

Implement VAIL #14

Comments

keiohta commented May 18, 2019

keiohta commented Jul 7, 2019

Experiment on Pendulum-v0

Results

keiohta commented Jul 7, 2019 • edited Loading

Experiment on HalfCheetah-v2

Results

TensorBoard output

keiohta commented Jul 7, 2019

sasayesh commented Jul 25, 2021

keiohta commented Jul 25, 2021

keiohta commented Jul 7, 2019 •

edited

Loading