Official code for "Active Policy Improvement from Multiple Black-box Oracles":
Xuefeng Liu,* Takuma Yoneda,* Chaoqi Wang,* Matthew R. Walter, and Yuxin Chen, "Active Policy Improvement from Multiple Black-box Oracles", in Proceedings of the International Conference on Machine Learning (ICML), 2023 (* denotes equal contribution) [arXiv]
You can either build a docker image based on docker/Dockerfile
, or pull from dockerhub: docker pull ripl/maps
The current script uses weights and biases. You may need to set WANDB_API_KEY
environment variable.
For example, from the project root directory, you can run:
$ python3 -m maps.scripts.pretraining.train_expert sac model_dir --env-name dmc:Cheetah-run-v1
This trains a SAC policy on Cheetah-run
env, and saves the network weights periodically under model_dir
.
The following command creates multiple run configurations over environment domains, set of experts and algorithms
* Before running the following, you need to adjust the expert paths (L6 in maps/scripts/pretraining/experts.py
) to the one you saved expert models to in the previous step.
$ python3 -m maps.scripts.sweep.sample_sweep
This generates sample_sweep.jsonl
For example, to launch the configuration in the first line:
$ python3 -m maps.scripts.train maps/scripts/sweep/sample_sweep.jsonl -l 0
- Remove the Dockerfile's dependency on Takuma's image
- Push the new docker image to dockerhub
- Make the pretrained experts available? (git lfs?)
If you find our work useful in your research, please consider citing the paper as follows:
@article{liu2023active,
title={Active Policy Improvement from Multiple Black-box Oracles},
author={Liu, Xuefeng and Yoneda, Takuma and Wang, Chaoqi and Walter, Matthew R and Chen, Yuxin},
journal={arXiv preprint arXiv:2306.10259},
year={2023}
}