This repository provides the official source code to reproduce the experimental results in the following study.
Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami and Toru Namerikawa. "Safe Exploration Method for Reinforcement Learning under Existence of Disturbance". In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2022, pp. 132–147, 2022. (arXiv preprint version is available here)
Python 3.8.10
To install requirements:
pip install -r requirements.txt
scipy >= 1.7.0
tensorflow-gpu >= 2.5.0
numpy >= 1.19.5
pandas >= 1.2.5
gym >= 0.18.3
To train models with an evaluation of the safe exploration method proposed in the paper, run each command:
python ./code/inverted_pendulum/main_SafeEx_ECMLPKDD2022_Pend.py #inverted pendulum
python ./code/robot_manipulator/main_SafeEx_ECMLPKDD2022_Mani.py #four-bar parallel link robot manipulator
You can train models and reproduce the results of cumulative costs and relative frequencies of constraint satisfaction shown in the paper (Figure 3) with the above command. If you want to change parameter and/or hyperparameter settings of this simulation, please change them written in codes.
As in a general problem setting of reinforcement learning, training data, i.e., a set of states, inputs and their corresponding immediate costs, are automatically generated and used (but not stored) in the above code.
Throughout the simulation, we obtain the followoing figures by plotting results saved in csv files:
Problem | Cumulative cost | Relative frequency of constraint satisfaction |
---|---|---|
Inverted pendulum | ||
Four-bar parallel link robot manipulator |
You can also find the pretrained models (actor and critic networks trained in the last trial with each method) in the directory "result_paper".
We do not guarantee success in training to swing up and hold an pendulum or to regulate links of the robot manipulator in our paper; however, if you want to evaluate trained models, run each command:
python ./code/inverted_pendulum/eval_SafeEx_ECMLPKDD2022_Pend.py
python ./code/robot_manipulator/eval_SafeEx_ECMLPKDD2022_Mani.py
For example, if you want to evaluate the above pre-trained models stored in the directory "result_paper", run each command:
python ./code/inverted_pendulum/demo_SafeEx_ECMLPKDD2022_Pend.py
python ./code/robot_manipulator/demo_SafeEx_ECMLPKDD2022_Mani.py
This project is under the BSD 3-Clause Clear License. See LICENSE for details.
@inproceedings{okawa2022safe,
author="Okawa, Yoshihiro and Sasaki, Tomotake and Yanami, Hitoshi and Namerikawa, Toru",
title="Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance",
booktitle="Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2022",
year="2022",
pages="132--147",
doi="10.1007/978-3-031-26412-2_9",
note="The official source code is available at \url{https://github.com/FujitsuResearch/SafeExploration}."
}