Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement VAIL #14

Closed
keiohta opened this issue May 18, 2019 · 5 comments
Closed

Implement VAIL #14

keiohta opened this issue May 18, 2019 · 5 comments
Milestone

Comments

@keiohta
Copy link
Owner

keiohta commented May 18, 2019

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

@keiohta keiohta added this to the IRL milestone May 24, 2019
@keiohta
Copy link
Owner Author

keiohta commented Jul 7, 2019

Experiment on Pendulum-v0

# Generate expert trajectories
$ python examples/run_sac.py --env-name=Pendulum-v0 --save-test-path --test-interval=100000 --max-steps 100000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1
# VAIL with Spectral Normalization
$ python examples/run_vail_ddpg.py --env-name=Pendulum-v0 --test-interval=5000 --expert-path-dir results/20190706T231341.308287_SAC_ --gpu -1 --enable-sn --dir-suffix SN

Results

  • Score
    • The maximum score is 0.
    • VAIL is unstable. Adding Spectral Normalization stabilizes learning and improves score.

190707_VAIL_Pendulum_score

  • DDPG loss

190707_VAIL_Pendulum_DDPG_loss

  • VAIL

190707_VAIL_Pendulum_info

@keiohta
Copy link
Owner Author

keiohta commented Jul 7, 2019

Experiment on HalfCheetah-v2

# Generate expert trajectories
$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=500000 --max-steps 500000 --test-episodes=20 --gpu -1

# VAIL
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# VAIL_SN
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

# GAIL
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1
# GAIL_SN
$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --test-interval=5000 --expert-path-dir results/20190707T094429.686303_SAC_ --gpu -1 --enable-sn --dir-suffix _SN

Results

  • All algorithm reproduced the expert score (~9000)
    • Note: SAC can achieve around 12000, but cut training with 0.5M steps to save time.
  • No big difference between GAIL, VAIL even utilizing Spectral Normalization
    • Gray: VAIL, Orange: VAIL + SN, Red: GAIL, Blue: GAIL + SN, Green: Expert (SAC)

TensorBoard output

  • Score

190707_VAIL_GAIL_HalfCheetah_score

- VAIL

190707_VAIL_HalfCheetah

- GAIL

190707_GAIL_HalfCheetah

- DDPG loss

190707_VAIL_GAIL_HalfCheetah_DDPG_loss

@keiohta
Copy link
Owner Author

keiohta commented Jul 7, 2019

4535cfb

@sasayesh
Copy link

Hi @keiohta
I couldn't reproduce the same results, I got this error for HalfCheetah:
Traceback (most recent call last):
File "examples/run_vail_ddpg.py", line 44, in
trainer()
File "/home/ss/.local/lib/python3.8/site-packages/tf2rl/experiments/irl_trainer.py", line 51, in call
next_obs, reward, done, _ = self._env.step(action) ##
File "/home/ss/.local/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 16, in step
observation, reward, done, info = self.env.step(action)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/half_cheetah.py", line 12, in step
self.do_simulation(action, self.frame_skip)
File "/home/ss/.local/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py", line 125, in do_simulation
self.sim.step()
File "mujoco_py/mjsim.pyx", line 126, in mujoco_py.cymj.MjSim.step
File "mujoco_py/cymj.pyx", line 156, in mujoco_py.cymj.wrap_mujoco_warning.exit
File "mujoco_py/cymj.pyx", line 77, in mujoco_py.cymj.c_warning_callback
File "/home/ss/projects/mujoco-py/mujoco_py/builder.py", line 363, in user_warning_raise_exception
raise MujocoException(warn + 'Check for NaN in simulation.')
mujoco_py.builder.MujocoException: Unknown warning type Time = 24.7000.Check for NaN in simulation.
OS: Ubuntu 20
TF version: tested on 2.3 and 2.4
tf2rl: Master
Your input would be appreciated.

@keiohta
Copy link
Owner Author

keiohta commented Jul 25, 2021

Hi @sasayesh , thanks for reporting the error. I'll try to reproduce the bug on Wednesday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants