Single agent Snake AI using RL and search algorithms
Uses Gymnasium and StableBaselines3
This uses a 12x12 grid, with the snake having initial size of 4
RL algorithms implemented: DQN, QR-DQN, PPO, Recurrent PPO, A2C
Each algorithm is tested for 1000 episodes
- Randomly choose a direction
- If there is something blocking, move in another direction
- One move horizon
- Move in the direction of the food
- If there is something blocking, move in another direction
- Depth-first search to find a complete path to the food
- If there is no path to the food, use greedy search
- Breadth-first search to find a complete (and shortest) path to the food
- If there is no path to the food, use greedy search
- Follows a Hamiltonian path (a path that visits every node once)
- Guarantees that the snake will never die
- Calculated via longest path
- Only works with even-sized grids
- Uses Hamiltonian path but it takes shortcuts
- StableBaselines3 models
- All models trained for 10,000,000 episodes
- Hyperparameters based on those used on Atari games
- Script downloads trained models automatically
Mean Episode Reward per Episode
Mean Episode Length per Episode
Uses a custom Gymnasium environment for snake game
Obsevation space:
- An RGB image of the game of shape (84, 84, 3)
Action space:
- A discrete space of 4 actions (up, left, down, right)
Rewards (modifiable):
- +1 for eating food
- -1 for dying
- -0.001 for everything else
Actions are made with tensor operations in PyTorch, inspired by this Medium article
Current RL approaches to snake are not very effective compared to algorithmic approaches. Successful RL approaches often have heavy reward shaping (e.g. distance to food) or use observations besides the pure RGB display (e.g. direction to food).
- Install requirements
pip install -r requirements.txt
- Run
benchmark.py
and pass optional arguments (defaults to greedy search without rendering)
python benchmark.py [-h] [--algo {greedy,random,bfs,dfs,ham,op_ham,dqn,qrdqn,a2c}] [--grid_size GRID_SIZE] [--initial_size INITIAL_SIZE] [--episodes EPISODES] [--show_render SHOW_RENDER] [--delay DELAY] [--save_gif SAVE_GIF]
-
Run respective
.py
file to train RL model (e.g.python dqn.py
for training DQN model) -
View tensorboard logs
tensorboard --logdir ./tensorboard/