VIME

PyTorch implementation of "VIME: Variational Information Maximizing Exploration".

Install required packages with pip3

pip3 install -r requirements.txt

Try out

Train agent in "RoboschoolHalfCheetah-v1". A dedicated configuration file for this environment is in configs directory.

python3 main.py --config configs/config_cheetah.toml --save-dir results/test_cheetah --vime

A model file is saved in results/test_cheetah/checkpoints/model.pth. Load this file to run the trained policy.

python3 main.py --config configs/config_cheetah.toml -m results/test_cheetah/checkpoints/model.pth -e -r

Options

--config              Config file path
--save-dir            Save directory
--vime                Whether to use VIME.
--visualize-interval  Interval to draw graphs of metrics.
--device              Device for computation.
-e, --eval            Run model evaluation.
-m, --model-filepath  Path to trained model for evaluation.
-r, --render          Render agent behavior during evaluation.

Implementation notes

Explanation of implementation is in Implementation notes of "VIME: Variational Information Maximizing Exploration".

Evaluations

Performance of reinforcement learning with and without VIME was compared in the following environments.

RoboschoolInvertedDoublePendulum-v1
RoboschoolWalker2d-v1
RoboschoolHumanoid-v1
RoboschoolHalfCheetah-v1 (with sparse reward)

In HalfCheetah, +1.0 was provided as a reward when a body moved more than 2.5 units. Soft actor-critic (SAC) is used as a base method.

InvertedDoublePendulum-v1	Walker2d-v1

Humanoid-v1	HalfCheetah-v1 (sparse reward)

Note on modifying environment

An instance of RoboschoolHalfCheetah-v1 holds body position to calculate a reward. The information can be accessed by env.body_xyz[0] (see gym_forward_walker.py for details).

RoboschoolHalfCheetah-v1 was wraped in a new environment "SparseHalfCheetah", which returns a reward of +1.0 when a body moves more than 5 units. To create a new custom environment, define a class which inherits gym.Env and implement follow methods

reset
step
render
close
seed

and following properties

action_space
observation_space
reward_range

By registering the newly defined environment, you can instantiate the environment using gym.make method.

from gym.envs.registration import register
register(
    id='EnvironmentName-v1',
    entry_point=NewEnvironmentClass,
    max_episode_steps=1000,
    reward_threshold=1,
    kwargs={},
)

You can provide keyword arguments to the new class through kwargs. For further information, refer to registration.py.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
agents		agents
configs		configs
envs		envs
.gitignore		.gitignore
README.md		README.md
bayesian_neural_network.py		bayesian_neural_network.py
main.py		main.py
memory.py		memory.py
requirements.txt		requirements.txt
utils.py		utils.py
vime.py		vime.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIME

Install required packages with pip3

Try out

Implementation notes

Evaluations

Note on modifying environment

About

Releases

Packages

Languages

matthewlujp/VIME

Folders and files

Latest commit

History

Repository files navigation

VIME

Install required packages with pip3

Try out

Implementation notes

Evaluations

Note on modifying environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages