Skip to content

nslyubaykin/relax_mbpo_example

Repository files navigation

MBPO with ReLAx

Example MBPO-SAC implementation with ReLAx

This repository contains an implementation of MBPO algorithm for SAC actor with ReLAx package.

The performance versus vanilla SAC is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot (MBPO is run only for 175k envsteps to save training time):

mbpo_training

The only difference in hyper-parameters settings between MBPO-SAC and vanilla SAC is the presence of model based acceleration. We can see a substantial advantage of MBPO in terms of training speed by looking at the averaged curves.

Resulting Policy

mbpo_sac.mp4