PIRL

Probabilistic inference for reinforcement learning using arbitrary probabilistic models.

Installing PIRL

Please ensure that Python 3.8 is installed and install PIRL by running

pip install -e .

This will also automatically install the python dependencies required to run PIRL itself.

PIRL ingredients

A PirlSettings instance is required to initialize an instance of PIRL. The settings need to specify the following main components:

method for propagation of uncertainty: trajectory sampling, moment matching, GP-EKF, GP-UKF or GP-PF
probabilistic model, e.g., DGCN, BNN or GP
reward, e.g., ExponentialReward
policy, e.g., RBFController or NNController
environment for experiments
optimizer, e.g., Adam optimizer from Tensorflow or 'DE' for differential evolution from Tensorflow Probability

Please note that moment matching is only applicable in combination with a Gaussian process with squared exponential kernel.

Generally, it is only necessary to provide a gym-like environment to perform probabilistic reinforcement learning using PIRL. More precisely, an environment is expected to possess

the methods reset, render, step and close (if close_env=True in PirlSettings),
a state attribute,
an action_space attribute with a sample method.

Examples

Additional requirements

The enclosed examples use gym as well as the physics engines mujoco and pybullet.

In order to install mujoco download version 2.1 and move the extracted mujoco210 directory to ~/.mujoco/mujoco210. Furthermore, add the path to the environment variable LD_LIBRARY_PATH (Linux) or the system variable PATH (Windows).

The python packages gym and mujoco-py will be added by running

pip install -r optional-requirements.txt

Finally, pybullet-gym needs to be installed manually with

git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .

pybullet will be installed automatically.

Running experiments

Experiments for the provided examples can be run using

cd examples
python run_example.py
-A APPROACH, --approach APPROACH
                      Uncertainty propagation method. One of: Sampling, MomentMatching, PF, EKF, UKF. The last ones correspond to particle, extended Kalman and unscented Kalman filters, respectively. Note that MomentMatching is only
                      compatible with the squared_exponential model.
-M MODEL, --model MODEL
                      Name of the model to be used. One of: dgcn, squared_exponential, bnn, exponential, matern32, matern52.
-E ENVIRONMENT, --env ENVIRONMENT
                      Name of the environment. One of: InvPendulumSwingUp, InvDoublePendulum, ContinuousMountainCar, Pendulum
-B, --background      Deactivates rendering the experiments.
-N N_INIT, --init N_INIT
                      Number of initial experiments
-T N_ITER, --iter N_ITER
                      Number of iterations (policy updates)
-R RESTARTS, --restarts RESTARTS
                      Number of restarts for the inner optimization. Has a large impact on the duration
-V VERBOSE, --verbose VERBOSE
                      Level of verbosity (0 for non-verbose)

DGCN

The current implementation of DGCN is enclosed as a compiled binary file. In order to execute the DGCN code Python 3.8 is necessary.

Details on usage of DGCN

A DGCN instance is initialized in use of

model = DGCN(X, y, num_neurons=20)

where X and y are the training samples and labels, respectively, and num_neurons specifies the number of neurons in the hidden layers of the neural network.

Afterwards, model training takes place in use of the fit method

model.fit(max_epochs=500, batch_size=None, noise=False)

where max_epochs is the maximal number of epochs, batch_size enables to use batch training with a given batch size and noise specifies if the model is supposed to learn aleatoric uncertainty.

Finally, the predict method is used to make predictions on test samples

y_pred, var = model.predict(X, pred_var=True)

where y_pred is the mean prediction of the model, var denotes the variance of the predictions, X are the test samples. var is None if pred_var is False. Instead of the predict method, it is also possible to use _predict which will not cast X to the required datatype before prediction. This increases the speed but may result in undefined behaviour in case of wrong types.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
pirl		pirl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
optional-requirements.txt		optional-requirements.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

pirl

pirl

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

optional-requirements.txt

optional-requirements.txt

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

PIRL

Installing PIRL

PIRL ingredients

Examples

Additional requirements

Running experiments

DGCN

Details on usage of DGCN

About

Releases

Packages

Contributors 3

Languages

License

Probabilistic-ML/pirl

Folders and files

Latest commit

History

Repository files navigation

PIRL

Installing PIRL

PIRL ingredients

Examples

Additional requirements

Running experiments

DGCN

Details on usage of DGCN

About

Topics

Resources

License

Stars

Watchers

Forks

Languages