Wheatbot

Partially observable, hierarchical RL environment based on Minecraft

Environment

This project is inspired by the ComputerCraft turtle, a Minecraft mod that introduces simplistic robots into the world. My objective is to train the turtle to autonomously complete tasks, and then to deploy the policies in the game.

Objective

In what might be the most over-engineered wheat farm in Minecraft, the agent starts randomly on a wheat farm, and it must navigate to the wheat, collect it, and bring back what it collected to a chest within a time limit and fixed fuel budget. While much less impressive than MineDojo, there's a number of aspects that make this problem challenging:

Partial Observability The turtle, unlike most real-world robots, has extremely limited perception of its environment. It can only perceive blocks it is directly touching, and its action space is similarly limited to moving forward, turning, and manipulating blocks it is facing. This is solved using recurrent models (transformers) that can remember previous observations to better estimate the current state.
Parametric Action Space Some actions are only valid occasionally, such as moving forwards or mining certain blocks. This is implemented using invalid action masking (setting invalid logits to -inf so the softmax generates a probability of 0).
Task Dependency The robot must complete the subgoals in the order above to succeed at the task, and the long horizon of the problem makes it a good candidate for hierarchical reinforcement learning.
Reward Function To guarantee optimal behavior and facilitate learning, the reward function consists of multiple potential-based reward shaping functions to guide the agent, as well as hierarchical reward scaling based on the agent's gamma value to ensure its preferences align with the desired behavior.

Furthermore, actually deploying the agent inside a real instance of Minecraft is experience similar to deploying it in a real robot.

For more information on the environment design, see FarmingEnv.md

Setup

The package can be installed using pip install .. Note that in order to train the agents, you will need a deep learning framework (I prefer torch), so you'll need to separately install that depending on if you want CUDA or not.

Training

I used RLlib to train PPO agents on the custom environment. An example of training on the regular environment ("flat") is in examples/farming_ppo.py. A training script for the hierarchical environment is given in examples/hierarchical_farming_ppo.py.

Deployment

Coming soon..

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
client/farming		client/farming
examples		examples
tests		tests
wheatbot		wheatbot
.gitignore		.gitignore
README.md		README.md
farmingenv.md		farmingenv.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wheatbot

Environment

Objective

Setup

Training

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wheatbot

Environment

Objective

Setup

Training

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages