Source codes for the experiments in Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning.
- Install basic packages, using e.g.,
conda create -n sdmgan python=3.8.5
conda activate sdmgan
pip install numpy matplotlib seaborn gym==0.17.0 torch==1.10.1 cudatoolkit==11.1.74
and adding other possible dependencies. 2. Install MuJoCo and mujoco-py. 3. Install D4RL.
Source code for the toy experiment is in the Toy_Experiment
folder. Toy_Experiment/run.sh
is an example of run file.
Please modify it according to hardware availability.
Source code for the offline RL experiment using the D4RL benchmark is in the D4RL_Experiment
folder.
The run files for the SDM-GAN variant is generated by the D4RL_Experiment/submit_jobs_server_gan.py
file.
An example use of this file is
cd D4RL_Experiment
python submit_jobs_server_gan.py
Flags can be provided to the python
command.
Please take a look at this file for available flags.
The location of the generated run files will be printed out.
The file D4RL_Experiment/submit_jobs_server_w1.py
generates the run files for the SDM-WGAN variant used in the ablation study.
The usage is the same as the SDM-GAN variant.
The run files will generate a folder for each (dataset, seed)
pair.
Within a such folder, the file eval_norm.npy
stores the normalized scores and eval.npy
records the unnormalized scores.
The normalized scores are calculated by the D4RL package.
MIT License.