This folder contains the codes for Self Reward Design with Fine-grained Interpretability.
In this project, we attempt to solve reinforcement learning problem using artificial neural network (NN) attained to achieve interpretability in an extreme way. Each neuron in the NN is defined with purposeful human design.
All commands used to execute our experiments can be found in misc/commands.txt and misc/commands_mujoco.txt. The full results can be found in our google drive.
We use conda environment, env.yml is available. Some manual installation is still necessary. We use pytorch 1.12.1. Please perform the necessary installation that depends on your machine (refer to pytorch's main website)
In this scenario, traditional RL is not the suitable choice since interpretability is crucial. Fig. (C) is our main result, while fig. (D) shows the result where the lack of interpretability results in sabotaged result.
In this scenario, we use SRD to control half cheetah motion.
In the following example, we show movement with inhibitor=2, i.e. we allow the user to give "stop" instruction.
To-do:
- remove ROOT_DIR argument that is not used.
- add more mujoco examples.
All codes for version 1 has been moved into legacy/v1.
Quick start: refer to the _quick_start
folder.
Existing results can be found in google drive link.
A simple toy world where a fish either moves or eats food, while trying to stay alive.
A grid world where robot tries to reach the target tile (yellow).
The project features uncertainty avoidance, where robot tries to avoid lava tiles at all cost.