QNTRPO: Quasi-Newton Trust Region Policy Optimization

System requirements: The code has been tested on these environments.

Ubuntu 16.04 LTS
Python 3.6.7 (will not work on Python 3.6.0 due to some issues of pytorch and python 3.6.0)
Torch 1.1.0 (the most recent version of pytorch will work )
Mujoco_py==1.50
Gym

Features

QNTRPO solves the Policy Optimization problem that arises in Reinforcement Learning using a Quasi-Newton Trust Region algorithm.

Installation

The code depends on external libraries. Install the software following the instructions below. We are describing the installation in a virtual environment.

conda create -n qntrpo python=3.11 anaconda

source activate qntrpo

conda install pytorch

Install Mujoco and mujoco-py following the instructions in https://github.com/openai/mujoco-py (License: MIT)

Install Gym following the instructions in https://github.com/openai/gym (License: MIT)

Usage

If a user wants to change the trust region radius for optimization, they should change the parameter "tr_maxdelta" on line 67 in the code "trust_region_opt_torch.py". The current value is 1e-1. It is suggested to run the code with this value. The performance of the algorithm on other values have not been fully tested yet.

A different batch size could be used by adding another argument while calling the code, --batch-size N, where (N is an integer say 25000), i.e.,

python main.py --env-name "Walker2d-v2" --seed 1243 --batch-size 25000

Testing

QNTRPO algorithm can be tested by running the following in a terminal (for example for Walker2d and seed, say 1243).

python main.py --env-name "Walker2d-v2" --seed 1243

Citation

If you use the software, please cite the following (TR2019-120):

@inproceedings{Jha2019oct,
author = {Jha, Devesh K. and Raghunathan, Arvind and Romeres, Diego},
title = {Quasi-Newton Trust Region Policy Optimization},
booktitle = {Conference on Robot Learning (CoRL)},
year = 2019,
editor = {Leslie Pack Kaelbling and Danica Kragic and Komei Sugiura},
pages = {945--954},
month = oct,
publisher = {Proceedings of Machine Learning Research},
url = {https://www.merl.com/publications/TR2019-120}
}

Contact

Please contact one of us Devesh K Jha (jha@merl.com), Arvind U Raghunathan (raghunathan@merl.com), or Diego Romeres (romeres@merl.com).

Contributing

See CONTRIBUTING.md for our policy on contributions.

License

Released under AGPL-3.0-or-later license, as found in the LICENSE.md file.

All files:

Copyright (C) 2019, 2023 Mitsubishi Electric Research Laboratories (MERL).

SPDX-License-Identifier: AGPL-3.0-or-later

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.reuse		.reuse
.vscode		.vscode
QNTRPO		QNTRPO
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.reuse

.reuse

.vscode

.vscode

QNTRPO

QNTRPO

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE.md

LICENSE.md

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

QNTRPO: Quasi-Newton Trust Region Policy Optimization

Features

Installation

Usage

Testing

Citation

Contact

Contributing

License

About

Releases 1

Packages

Languages

License

merlresearch/QNTRPO

Folders and files

Latest commit

History

Repository files navigation

QNTRPO: Quasi-Newton Trust Region Policy Optimization

Features

Installation

Usage

Testing

Citation

Contact

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Languages