Skip to content

The repository is created for AR2L algorithm used for solving online 3D-BPP.

Notifications You must be signed in to change notification settings

panyxy/ar2l_bpp

Repository files navigation

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

Introduction

This is the official PyTorch implementation for the paper titled "Adjustable Robust Reinforcement Learning for Online 3D Bin Packing". The paper introduces the AR2L framework, which takes into account both the average performance and worst-case performance of a packing policy. By using this framework, the trained packing policy can be made more robust, while still maintaining acceptable performance in nominal cases. In the AR2L framework, the training process involves alternating between training the packing policy, the permutation-based attacker, and the mixture-dynamics model in each iteration. The PPO algorithm is utilized to train these three policies. Additionally, the packing policy is built on the PCT algorithm. The video demonstration can be found using the YouTube Link.

Dependencies

Before executing the training process, please ensure that the necessary requirements have been installed.

pip install -r requirements.txt

Training

The packing policy has the flexibility to observe a varying number of next boxes (NNB). The robustness of the policy can be adjusted by tuning the hyperparameter alpha.

Environment: discrete, NNB=5, alpha=1.0

bash scripts train_disc.sh 5 1.0

Environment: discrete, NNB=10, alpha=1.2

bash scripts train_disc.sh 10 1.2

Environment: discrete, NNB=15, alpha=1.3

bash scripts train_disc.sh 15 1.3

Environment: discrete, NNB=20, alpha=1.0

bash scripts train_disc.sh 20 1.0

Environment: continuous, NNB=5, alpha=1.0

bash scripts train_cont.sh 5 1.0

Environment: continuous, NNB=10, alpha=1.2

bash scripts train_cont.sh 10 1.2

Environment: continuous, NNB=15, alpha=1.0

bash scripts train_cont.sh 15 1.0

Environment: continuous, NNB=20, alpha=1.1

bash scripts train_cont.sh 20 1.1

Validation

To select an effective AR2L packing policy, you can evaluate various packing policies with and without the permutation-based attacker.

bash val_disc.sh [NNB] [path to the parent directory where all the models are saved] load_adv

example: bash val_disc.sh 5 ./logs/experiment/timeStr load_adv
bash val_disc.sh [NNB] [path to the parent directory where all the models are saved] not_load_adv

example: bash val_disc.sh 5 ./logs/experiment/timeStr not_load_adv

After conducting the validation, please add the space utilization in the nominal dynamics (not_load_adv) and the space utilization in the worst-case dynamics (load_adv) for each model. Then, you can choose the best one among them.

Evaluation

You can evaluate the selected packing policy in various settings.

bash eval_disc.sh [NNB] [path to the BPP model] [path to the adv model] load_adv

example: bash eval_disc.sh 5 ./logs/experiment/timeStr/BPP-subtimeStr.pt ./logs/experiment/timeStr/Adv-subtimeStr.pt load_adv
bash eval_disc.sh [NNB] [path to the BPP model] [path to the adv model] not_load_adv

example: bash eval_disc.sh 5 ./logs/experiment/timeStr/BPP-subtimeStr.pt ./logs/experiment/timeStr/Adv-subtimeStr.pt not_load_adv

Acknowledgement

We appreciate the anonymous reviewers, (S)ACs, and PCs of NeurIPS2023 for their insightful comments to further improve our paper and their service to the community. We would like to thank the authors of PCT for providing their highly valuable implementation of PCT. and the authors of the PPO PyTorch Implementation.

Citation

@inproceedings{pan2023ar2l,
 author = {Pan, Yuxin and Chen, Yize and Lin, Fangzhen},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {51926--51954},
 publisher = {Curran Associates, Inc.},
 title = {Adjustable Robust Reinforcement Learning for Online 3D Bin Packing},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/a345ed605675c7c484e740a8ceaa6b45-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
}

License

This source code is provided solely for academic use. Please refrain from using it for commercial purposes without obtaining proper authorization from the author.

About

The repository is created for AR2L algorithm used for solving online 3D-BPP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages