Skip to content
/ SnakeAI Public

Single agent Snake AI using RL and search algorithms

Notifications You must be signed in to change notification settings

zj-0/SnakeAI

Repository files navigation

SnakeAI

Single agent Snake AI using RL and search algorithms

Uses Gymnasium and StableBaselines3

This uses a 12x12 grid, with the snake having initial size of 4

RL algorithms implemented: DQN, QR-DQN, PPO, Recurrent PPO, A2C

Algorithms

Each algorithm is tested for 1000 episodes

Algorithm Demo Average score (max: 140) Average steps Average score per step
Random random_vid 2.45 424.1 0.00578
Greedy greedy_vid 19.90 194.9 0.110
DFS dfs_vid 20.13 531.3 0.0379
BFS bfs_vid 31.35 323.6 0.0969
Hamiltonian ham_vid 140 5016.7 0.0279
Optimised Hamiltonian op_ham_vid 140 4585.5 0.0305
DQN dqn_vid 33.36 392.9 0.0850
QR-DQN qrdqn_vid 33.39 379.7 0.0914
A2C a2c_vid 19.18 178.6 0.107

Random search

  • Randomly choose a direction
  • If there is something blocking, move in another direction

Greedy search

  • One move horizon
  • Move in the direction of the food
  • If there is something blocking, move in another direction

DFS

  • Depth-first search to find a complete path to the food
  • If there is no path to the food, use greedy search

BFS

  • Breadth-first search to find a complete (and shortest) path to the food
  • If there is no path to the food, use greedy search

Hamiltonian path

  • Follows a Hamiltonian path (a path that visits every node once)
  • Guarantees that the snake will never die
  • Calculated via longest path
  • Only works with even-sized grids

Optimised Hamiltonian path

  • Uses Hamiltonian path but it takes shortcuts

take_shortcuts

Reinforcement learning

  • StableBaselines3 models
  • All models trained for 10,000,000 episodes
  • Hyperparameters based on those used on Atari games
  • Script downloads trained models automatically

Training graphs

  • #425066 - DQN
  • #9334e6 - QR-DQN
  • #12b5cb - A2C

Mean Episode Reward per Episode

mean_reward

Mean Episode Length per Episode

mean_length

Gym environment

Uses a custom Gymnasium environment for snake game

Obsevation space:

  • An RGB image of the game of shape (84, 84, 3)

Action space:

  • A discrete space of 4 actions (up, left, down, right)

Rewards (modifiable):

  • +1 for eating food
  • -1 for dying
  • -0.001 for everything else

Actions are made with tensor operations in PyTorch, inspired by this Medium article

Conclusion

Current RL approaches to snake are not very effective compared to algorithmic approaches. Successful RL approaches often have heavy reward shaping (e.g. distance to food) or use observations besides the pure RGB display (e.g. direction to food).

To use

  1. Install requirements
pip install -r requirements.txt
  1. Run benchmark.py and pass optional arguments (defaults to greedy search without rendering)
python benchmark.py [-h] [--algo {greedy,random,bfs,dfs,ham,op_ham,dqn,qrdqn,a2c}] [--grid_size GRID_SIZE] [--initial_size INITIAL_SIZE] [--episodes EPISODES] [--show_render SHOW_RENDER] [--delay DELAY] [--save_gif SAVE_GIF]
  1. Run respective .py file to train RL model (e.g. python dqn.py for training DQN model)

  2. View tensorboard logs

tensorboard --logdir ./tensorboard/

References

About

Single agent Snake AI using RL and search algorithms

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages