This repo contains and implementation of an agent that can learn to maximise reward in environments with NetHack interface such as nle or MiniHack.
- A Perceiver-inspired encoder of NetHack states.
- An implementation of a PPO-based RL agent
- Advantage is estimated using GAE
- Per-batch advantage normalization and entropy-based policy regularization are supported.
- This agent was meant mainly as a baseline, most of the effort in this repo went into MuZero.
- An implementation of MuZero-based RL agent.
- MCTS runs on GPU and is pretty fast.
- Reanalyze is supported.
- Recurrent memory is supported.
- State consistency loss inspired by Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision is supported.
- Ideas from Stochastic MuZero are implemented, so the agent runs correctly in stochastic environments.
- A search policy from Monte-Carlo tree search as regularized policy optimization can be enabled to improve efficiency of MCTS, which can be very helpful when simulation budget is small or branching factor is very large.
- Training and inference is implemented in JAX, with the help of rlax and optax
- Models are implemented in JAX/Flax
- Clone the repository:
git clone https://github.com/hr0nix/omega.git
- Run the docker container:
bash ./omega/docker/run_container.sh
- Create a new experiment based on one of the provided configs:
python3.8 ./tools/experiment_manager.py make --config ./configs/muzero/random_room_5x5.yaml --output-dir ./experiments/muzero_random_room_5x5
- Run the newly created experiment. You can optionally track the experiment using wandb (you will be asked if you want to, definitely recommended).
python3.8 ./tools/experiment_manager.py run --dir ./experiments/muzero_random_room_5x5 --gpu 0
- After some episodes are completed, you can visualize them:
python3.8 ./tools/experiment_manager.py play --file ./experiments/muzero_random_room_5x5/episodes/<EPISODE_FILENAME_HERE>