omega

This repo contains and implementation of an agent that can learn to maximise reward in environments with NetHack interface such as nle or MiniHack.

Repo highlights

A Perceiver-inspired encoder of NetHack states.
An implementation of a PPO-based RL agent
- Advantage is estimated using GAE
- Per-batch advantage normalization and entropy-based policy regularization are supported.
- This agent was meant mainly as a baseline, most of the effort in this repo went into MuZero.
An implementation of MuZero-based RL agent.
- MCTS runs on GPU and is pretty fast.
- Reanalyze is supported.
- Recurrent memory is supported.
- State consistency loss inspired by Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision is supported.
- Ideas from Stochastic MuZero are implemented, so the agent runs correctly in stochastic environments.
- A search policy from Monte-Carlo tree search as regularized policy optimization can be enabled to improve efficiency of MCTS, which can be very helpful when simulation budget is small or branching factor is very large.
Training and inference is implemented in JAX, with the help of rlax and optax
Models are implemented in JAX/Flax

git clone https://github.com/hr0nix/omega.git

bash ./omega/docker/run_container.sh

python3.8 ./tools/experiment_manager.py make --config ./configs/muzero/random_room_5x5.yaml --output-dir ./experiments/muzero_random_room_5x5

Run the newly created experiment. You can optionally track the experiment using wandb (you will be asked if you want to, definitely recommended).

python3.8 ./tools/experiment_manager.py run --dir ./experiments/muzero_random_room_5x5 --gpu 0

python3.8 ./tools/experiment_manager.py play --file ./experiments/muzero_random_room_5x5/episodes/<EPISODE_FILENAME_HERE>

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
configs		configs
docker		docker
images		images
omega		omega
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt