A Clembench game for evaluating LLMs on 2D text-based spatial reasoning tasks. Read the paper here!
SimpleSnake requires that Clemcore and Clembench are already installed and functioning on your system.
After verifying Clembench is correctly installed, clone the repository into your local clembench directory alongside existing games.
git clone https://github.com/porterrigby/simplesnake.git $CLEMBENCH_HOME/simplesnakeSimpleSnake contains three separate game variations. To run the vanilla version, execute:
clem run -g simplesnake -m <model-to-evaluate-on>To run a variation of SimpleSnake that implements obstacles, run:
clem run -g simplesnake_withobstacles -m <model-to-evaluate-on>A variation of the game that focuses on up-front planning instead of incremental moves can also be run using:
clem run -g simplesnake_withplanning -m <model-to-evaluate-on>Clembench supports various model backends and APIs. Run clem list models for a list of what models are currently
supported.
Games played by models within Clembench can be transcribed and evaluated into html files using clem transcribe
and clem eval respectively. The resulting files are saved under results/.