OR2AC

This is the implementation for my undergrad Final Year Project “Offline Risk-Averse Actor-Critic with Curriculum Learning”.

Project Description

In real-world scenarios, Offline RL has emerged as a preferable approach as it allows policy learning solely from historical data, eliminating the need for environmental interactions during learning. However, deploying offline RL presents multiple challenges, particularly in policy safety, handling out-of-distribution state-action pairs, and policy generalization. To address these, this project advances risk-averse and generalizable offline RL algorithms, enhancing real-world applicability.

This code implements several RL algorithms to foster risk-sensitive RL agents as baselines, including

Here is the overview of the proposed method:

Fig.1 Overview of OR2AC

The process involves setting the performance metric and difficulty adjustment mechanism in the Curriculum Scheduler. Then, an online algorithm is selected to train the data collector and collect transitions for offline training. The collected data is used to train the offline learner using an offline algorithm from the Model Zoo. The Curriculum Scheduler controls the environment difficulty throughout the training process.

Installation

Here are the steps to run on your machine:

Create and activate a conda environment, and install packages:

conda create -n OR2AC python=3.9
conda activate OR2AC
pip install -r requirements.txt

Run experiments: First, run train_online.py to generate dataset for offline training:

python train_online.py --task_name online --env riskymassrandom --algo sac --seed 666

Second, run train_offline.py:

python train_offline.py --task_name offline --env riskymassrandom --algo codac --seed 666 --risk_prob 0.9 --risk_penalty 50.0 --risk_type cvar --risk_param 0.1 --tau_type iqn

The file structure should look like this, then you can test your model using visualize.py

 .
 ├── env
 ├── model
 │   ├── sac.py
 │   ├── dsac.py
 │   ├── codac.py
 │   ├── networks.py
 │   └── utils.py
 ├── dataset
 │   └── task
 │       └── level
 ├── saved_policies
 │   └── task
 │       ├── online
 │       └── offline
 ├── README.md
 ├── train_online.py
 ├── train_offline.py
 ├── replay_memory.py
 └── visualize.py

Acknowledgement

The code in this repository is based on and inspired by the work of the authors and contributors from CODAC and DSAC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

env

env

model

model

.gitignore

.gitignore

README.md

README.md

replay_memory.py

replay_memory.py

requirements.txt

requirements.txt

train_offline.py

train_offline.py

train_online.py

train_online.py

visualize.py

visualize.py

Repository files navigation

OR2AC

Project Description

Installation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
env		env
model		model
.gitignore		.gitignore
README.md		README.md
replay_memory.py		replay_memory.py
requirements.txt		requirements.txt
train_offline.py		train_offline.py
train_online.py		train_online.py
visualize.py		visualize.py

pinqian77/OR2AC

Folders and files

Latest commit

History

Repository files navigation

OR2AC

Project Description

Installation

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Languages