Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

This repository provides the official source code to reproduce the experimental results in the following study.

Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami and Toru Namerikawa. "Safe Exploration Method for Reinforcement Learning under Existence of Disturbance". In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2022, pp. 132–147, 2022. (arXiv preprint version is available here)

Requirements

Python 3.8.10

To install requirements:

pip install -r requirements.txt

scipy >= 1.7.0
tensorflow-gpu >= 2.5.0
numpy >= 1.19.5
pandas >= 1.2.5
gym >= 0.18.3

Training and Evaluation

To train models with an evaluation of the safe exploration method proposed in the paper, run each command:

python ./code/inverted_pendulum/main_SafeEx_ECMLPKDD2022_Pend.py  #inverted pendulum
python ./code/robot_manipulator/main_SafeEx_ECMLPKDD2022_Mani.py  #four-bar parallel link robot manipulator

You can train models and reproduce the results of cumulative costs and relative frequencies of constraint satisfaction shown in the paper (Figure 3) with the above command. If you want to change parameter and/or hyperparameter settings of this simulation, please change them written in codes.

training data

As in a general problem setting of reinforcement learning, training data, i.e., a set of states, inputs and their corresponding immediate costs, are automatically generated and used (but not stored) in the above code.

Results (Figure 3 in the paper)

Throughout the simulation, we obtain the followoing figures by plotting results saved in csv files:

Problem	Cumulative cost	Relative frequency of constraint satisfaction
Inverted pendulum
Four-bar parallel link robot manipulator

Pre-trained Models

You can also find the pretrained models (actor and critic networks trained in the last trial with each method) in the directory "result_paper".

Evaluation (optional)

We do not guarantee success in training to swing up and hold an pendulum or to regulate links of the robot manipulator in our paper; however, if you want to evaluate trained models, run each command:

python ./code/inverted_pendulum/eval_SafeEx_ECMLPKDD2022_Pend.py
python ./code/robot_manipulator/eval_SafeEx_ECMLPKDD2022_Mani.py

For example, if you want to evaluate the above pre-trained models stored in the directory "result_paper", run each command:

python ./code/inverted_pendulum/demo_SafeEx_ECMLPKDD2022_Pend.py
python ./code/robot_manipulator/demo_SafeEx_ECMLPKDD2022_Mani.py

Licence

This project is under the BSD 3-Clause Clear License. See LICENSE for details.

BibTeX

@inproceedings{okawa2022safe,
author="Okawa, Yoshihiro and Sasaki, Tomotake and Yanami, Hitoshi and Namerikawa, Toru",
title="Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance",
booktitle="Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2022",
year="2022",
pages="132--147",
doi="10.1007/978-3-031-26412-2_9",
note="The official source code is available at \url{https://github.com/FujitsuResearch/SafeExploration}."
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

figs

figs

result_paper

result_paper

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Requirements

Training and Evaluation

training data

Results (Figure 3 in the paper)

Pre-trained Models

Evaluation (optional)

Licence

BibTeX

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
figs		figs
result_paper		result_paper
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

FujitsuResearch/SafeExploration

Folders and files

Latest commit

History

Repository files navigation

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Requirements

Training and Evaluation

training data

Results (Figure 3 in the paper)

Pre-trained Models

Evaluation (optional)

Licence

BibTeX

About

Resources

License

Stars

Watchers

Forks

Languages