SnakeAI

Single agent Snake AI using RL and search algorithms

This uses a 12x12 grid, with the snake having initial size of 4

RL algorithms implemented: DQN, QR-DQN, PPO, Recurrent PPO, A2C

Algorithms

Each algorithm is tested for 1000 episodes

Algorithm	Average score (max: 140)	Average steps	Average score per step
Random	2.45	424.1	0.00578
Greedy	19.90	194.9	0.110
DFS	20.13	531.3	0.0379
BFS	31.35	323.6	0.0969
Hamiltonian	140	5016.7	0.0279
Optimised Hamiltonian	140	4585.5	0.0305
DQN	33.36	392.9	0.0850
QR-DQN	33.39	379.7	0.0914
A2C	19.18	178.6	0.107

Random search

Randomly choose a direction
If there is something blocking, move in another direction

Greedy search

One move horizon
Move in the direction of the food
If there is something blocking, move in another direction

DFS

Depth-first search to find a complete path to the food
If there is no path to the food, use greedy search

BFS

Breadth-first search to find a complete (and shortest) path to the food
If there is no path to the food, use greedy search

Hamiltonian path

Follows a Hamiltonian path (a path that visits every node once)
Guarantees that the snake will never die
Calculated via longest path
Only works with even-sized grids

Optimised Hamiltonian path

Uses Hamiltonian path but it takes shortcuts

Reinforcement learning

StableBaselines3 models
All models trained for 10,000,000 episodes
Hyperparameters based on those used on Atari games
Script downloads trained models automatically

Training graphs

- DQN
- QR-DQN
- A2C

Mean Episode Reward per Episode

Mean Episode Length per Episode

Gym environment

Uses a custom Gymnasium environment for snake game

Obsevation space:

An RGB image of the game of shape (84, 84, 3)

Action space:

A discrete space of 4 actions (up, left, down, right)

Rewards (modifiable):

+1 for eating food
-1 for dying
-0.001 for everything else

Actions are made with tensor operations in PyTorch, inspired by this Medium article

Conclusion

Current RL approaches to snake are not very effective compared to algorithmic approaches. Successful RL approaches often have heavy reward shaping (e.g. distance to food) or use observations besides the pure RGB display (e.g. direction to food).

To use

Install requirements

pip install -r requirements.txt

Run benchmark.py and pass optional arguments (defaults to greedy search without rendering)

python benchmark.py [-h] [--algo {greedy,random,bfs,dfs,ham,op_ham,dqn,qrdqn,a2c}] [--grid_size GRID_SIZE] [--initial_size INITIAL_SIZE] [--episodes EPISODES] [--show_render SHOW_RENDER] [--delay DELAY] [--save_gif SAVE_GIF]

Run respective .py file to train RL model (e.g. python dqn.py for training DQN model)
View tensorboard logs

tensorboard --logdir ./tensorboard/

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
vid_saves		vid_saves
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
a2c.py		a2c.py
benchmark.py		benchmark.py
custom_env.py		custom_env.py
dqn.py		dqn.py
main.py		main.py
ppo.py		ppo.py
ppo_lstm.py		ppo_lstm.py
qrdqn.py		qrdqn.py
requirements.txt		requirements.txt
search.py		search.py

zj-0/SnakeAI

Folders and files

Latest commit

History

Repository files navigation

SnakeAI

Algorithms

Random search

Greedy search

DFS

BFS

Hamiltonian path

Optimised Hamiltonian path

Reinforcement learning

Training graphs

Gym environment

Conclusion

To use

References

About

Topics

Resources

Stars

Watchers

Forks

Languages