This repository provides the implementation of Tsallis actor critic (TAC) method based on Spinningup packages which is educational resource produced by OpenAI. TAC generalizes the standard Shannon-Gibbs entropy maximization in RL to the Tsallis entropy.
Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning \and Its Application to Soft Mobile Robots," in Proc. of the Robotics: Science and System (RSS), 2020.
sudo apt-get update && sudo apt-get install libopenmpi-dev
virtualenv tacenv --python=python3.5 (--system-site-packages)
You can change "tacenv". If your machine already has tensorflow-gpu package, I reconmmend the option --system-site-packages to use tensorflow-gpu.
pip install gym[mujoco,robotics]
cd tsallis_actor_critic_mujoco
pip install -e .
cd tsallis_actor_critic_mujodo/custom_gym/
pip install -e .
If you want to add a customized environment, see https://github.com/openai/gym/tree/master/gym/envs#how-to-create-new-environments-for-gym
cd tsallis_actor_critic_mujoco
cd spinup/algos/tac
ls
The following files will be shown
tac
├── core.py
├── tac.py
├── tf_tsallis_statistics.py
├── Example_Tsallis_MDPs.ipynb
└── Example_Tsallis_statistics.ipynb
- Example_Tsallis_MDPs.ipynb shows the figure of performance error bound.
- Example_Tsallis_statistics.ipynb shows the multi armed bandit with maximum Tsallis entropy examples.
cd tsallis_actor_critic_mujoco
python -m spinup.run tac --env HalfCheetah-v2
cd tsallis_actor_critic_mujoco
python -m spinup.run tac --env HalfCheetah-v2 --exp_name half_tac_alpha_cst_q_1.5_cst_gaussian_q_log --epochs 200 --lr 1e-3 --q 1.5 --pdf_type gaussian --log_type q-log --alpha_schedule constant --q_schedule constant --seed 0 10 20 30 40 50 60 70 80 90
Results will be saved in data folder
[env]_[algorithm]_alpha_[alpha_schedule]_q_[entropic_index]_[q_schedule]_[distribution]_[entropy_type]
- [env]: Environment name, ex) half
- [algorithm]: Algorithm name, ex) tac
- [alpha_schedule] indicates alpha_schedule. Use cst for constant and sch for scheduling
- [entropic_index] indicates q
- [q_schedule] is q_schedule. Use cst for constant and sch for scheduling
- [distribution] indicates pdf_type which has two options: gaussian and q-gaussian
- [entropy_type] indicates log_type which has two options: log and q-log
This convention will help you not forget a parameter setting. Usage of convention
python -m spinup.run tac --env HalfCheetah-v2 --exp_name [experiment_name]
cd tsallis_actor_critic_mujoco
./shell_scripts/tsallis_half_cheetah.sh
To run mulitple experiments at once, we employ a simple and easy way as follows:
run program_1 & program_2 & ... & program_n