Gaussian Process Proximal Policy Optimization

About The Project

This work introduces a new scalable model-free actor-critic based algorithm based on Proximal Policy Optimization that uses a deep Gaussian process to directly approximate both the policy and the value function.

Getting started

Prerequisites

Poetry.

Running

This project uses Poetry for dependency management. You can also set up a virtual environment using Poetry. Poetry can be installed using pip:

pip install poetry

Then initiate the virtual environment with the required dependencies (see poetry.lock, pyproject.toml):

poetry config virtualenvs.in-project true    # ensures virtual environment is in project
poetry install

The virtual environment can be accessed from the shell using:

poetry shell

IDEs like Pycharm will be able to detect the interpreter of this virtual environment.

Alternatively generate a requirements.txt:

poetry export -f requirements.txt --without-hashes > requirements.txt

and

python3 -m pip install -r requirements.txt

Running the code

This project uses Hydra for configuration management. To run the code:

python main.py mode=train agent=gppo_walker2d  # (or other algorithms) or just `python main.py` for default configs

The codebase currently supports training, evaluation, and with optional tracking via Weights & Biases. Models can be saved and loaded. The results for each agent (GPPO, PPO, etc.) will be saved in the specified results directory. To reproduce results, ensure that train.yaml and config.yaml are configured as follows:

train.yaml

name: train

train: True
load_model: False
save_model: True
import_metrics: True
export_metrics: True

config.yaml

defaults:
  - mode: train   # Set a default mode value, it can be overridden by command-line args
  - agent: gppo_walker2d   # Replace with correct algorithm
  - _self_

num_episodes: 10000
num_bootstrap_samples: 100
num_runs: 1
results_save_path: "./results/"
environment: "Walker2d-v5"

normalize_obs: True
normalize_act: False
clip_obs: 10.0

wandb:
  project: "gppo-drl"
  entity: "ml_exp"
  use_wandb: True   # Disable if you do not want this

Information on modules

agents contains core RL implementations (e.g., PPO, GPPO) and a factory class for instantiating them.
gp contains implementations of Deep Gaussian Processes (vanilla and Deep Sigma Point Process variant) and also GPPO specific implementation variants and objective functions.
hyperparam_tuning contains a generic implementation of the Bayesian optimization algorithm and additional helper functions.
metrics contains MetricsTracker class which can be used to aggregate metrics across runs.
util contains utility functions and classes such as replaybuffer and rolloutbuffer.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
analysis		analysis
conf		conf
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gaussian Process Proximal Policy Optimization

About The Project

Getting started

Prerequisites

Running

Running the code

train.yaml

config.yaml

Information on modules

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

matthjs/gppo

Folders and files

Latest commit

History

Repository files navigation

Gaussian Process Proximal Policy Optimization

About The Project

Getting started

Prerequisites

Running

Running the code

train.yaml

config.yaml

Information on modules

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages