Automatic Evaluation of Excavator Operators using Learned Reward Functions

This repo contains code for our paper Automatic Evaluation of Excavator Operators using Learned Reward Functions by
Pranav Agarwal, Marek Teichmann, Sheldon Andrews and Samira Ebrahimi Kahou accepted at Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Installation

Simulator

We use Vortex Studio 2021a for collecting the dataset and further training the reinforcement learning policy.
The installation involves downloading all the files and running the .exe file, further selcting all the files when prompted.
The simulator requires a license which can be requested from CMLabs using their academic access program.

Code

Install Anaconda.
This repository is required to be cloned, further creating the conda environment.

git clone https://github.com/pranavAL/InvRL_Auto-Evaluate
cd InvRL_Auto-Evaluate
conda env create -f environment.yml
conda activate myenv

⚠️WARNING⚠️

All code should run within the specified virtual environment as created above, considering the strict requirements of Vortex. No further packages are required to be installed.

This work use wandb for real time logging of metrics, please register for the account and use the given command to login.

wandb login

System Requirements

OS - Windows 10
GPU >= NVIDIA GeForce GTX 1050 Ti
NVIDIA Driver >= 457.49
CUDA Version - 11.1

🔴IMPORTANT🔴

These are strict requirements. Vortex tools used in this work is not supported on Linux and its graphics for this paper is currently tested only on the above configurations.

Post-Installation

The default scenes and mechanism of the crane are manipulated to match the requirement of this work. Please run the below code to change default configurations.

copy scenes\*.vxscene "C:\CM Labs\Vortex Construction Assets 21.1\assets\Excavator\Scenes\ArcSwipe"
copy scenes\*.vxmechanism  "C:\CM Labs\Vortex Construction Assets 21.1\assets\Excavator\Mechanisms\Excavator"

Dataset Collection and Analysis

The dataset of different experts are collected using Vortex Advantage simulator systems which features enclosed computers running Vortex software.
To extract features into human readable format from the original raw dataset run

python ExtractFeatures.py

For further analysis, feature engineering and extracting the final dataset divided into train, test and validation set use

python EDA.py

To get a preview of the above code, for visual analysis of the dataset explore the jupyter notebook ExtractFeatures.ipynb and EDA.ipynb

Distribution Learning

Dynamics Distribution

python model_dynamics.py

Safety Distribution

python model_infractions.py

Inference

To check the reconstruction of the future states given the dynamics

python infer.py

To show visualisation of the distribution of infractions

python inference.py

Policy Learning

Training

cd ExcavatorRLEnv
python train.py --test_id "Reward Type" & python agent.py --test_id "Reward Type"

Please specify the reward type that is: "Task", "Dynamic", "DynamicSafety".

🔴IMPORTANT🔴

In an ideal case a single process should both interact with the simulator as well as update the policy.
In our case, the interaction with the environment as well as policy update is done with two different processes.
Since, Vortex use GIL lock and Pytorch need GIL for backpropagation.
To avoid this conflict run two different process as shown above.

Testing

python test.py --test_id "Reward Type"

🔴Note🔴

We do not release a single test code for running the complete project.
This is because of the need to train multiple independent modules.
Instead the above codes should be run sequentially to get the results as presented.

Paper associated

If you use it, please cite:

Automatic Evaluation of Excavator Operators using Learned Reward Functions Pranav Agarwal, Marek Teichmann, Sheldon Andrews, Samira Ebrahimi Kahou. Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

@article{agarwal2022automatic,
  title={Automatic Evaluation of Excavator Operators using Learned Reward Functions},
  author={Agarwal, Pranav and Teichmann, Marek and Andrews, Sheldon and Kahou, Samira Ebrahimi},
  journal={Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)       arXiv preprint arXiv:2211.07941},
  url={https://arxiv.org/abs/2211.07941v1}
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
ExcavatorRLEnv		ExcavatorRLEnv
datasets		datasets
outputs		outputs
save_model		save_model
scenes		scenes
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
EDA.py		EDA.py
ExtractFeatures.ipynb		ExtractFeatures.ipynb
ExtractFeatures.py		ExtractFeatures.py
README.md		README.md
demo.py		demo.py
demo_analysis.ipynb		demo_analysis.ipynb
demo_result.csv		demo_result.csv
environment.yml		environment.yml
infer.py		infer.py
inference.py		inference.py
model_dynamics.py		model_dynamics.py
model_infractions.py		model_infractions.py
train_LSTM.py		train_LSTM.py
vae_arguments.py		vae_arguments.py

pranavAL/InvRL_Auto-Evaluate

Folders and files

Latest commit

History

Repository files navigation