Skip to content

sid-sr/Q-Snake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Q-Snake

An interactive Q-learning web visualiser for Snake built with React.js

https://sid-sr.github.io/Q-Snake/

AboutFeaturesUsageInstallationAcknowledgements


About

• A website that visualises the Q-learning RL algorithm and shows how an AI agent can learn to play Snake using it.
• Built using create-react-app and uses no reinforcement learning libraries/environments.
• Uses a simplified state representation and reward space to ensure learning is fast, this converts it into more of a planning problem than RL, but the goal was just to visualise the RL algorithm within a reasonable training duration.

Features

AI after 5000 episodes.

State Space:

  • Using just 2 values to represent a state, which are:
  • Relative location of the apple to the head (8 directions)
  • Presence of danger one step ahead of the head in 4 directions (array of 4 numbers, which results in 16 values).
  • This results in a 8 x 16 x 4 Q-table. The visualization to the right is after training the snake for 5000 episodes.

Reward Space:

The reward space used here makes the problem a lot easier to solve, but it was to ensure reasonable results are obtained in a short time frame and the changes in the Q-table can be visualized quickly.

Condition Reward
Hitting the border / eating itself / moving 500 steps without eating the apple -100
Eating the apple +30
Moving towards the apple +1
Moving away from the apple -5

(Used the state and reward space followed in this video: AI learns to play Snake using RL)

The Q-table:

The Q-table

  • The Q-table shown above has dimensions 8 x 16 (with 4 entries in each cell for each move).
  • Each cell in the grid is a state, ie: one situation the snake finds itself in, like the apple is in the top left direction and there is danger to left, which move do I make - up, left, down, or right?
  • The blank entries correspond to unexplored states. So initially, all states are unexplored. As the AI plays the game, it explores the different states and tries to learn what moves work (based on the reward for each action made).
  • The white entries correspond to unexplored states.
  • The red entries correspond to explored states with wrong move learnt by the AI.
  • The green entries correspond to explored states with right move learnt by the AI (ie: what move a human would make).

Usage

The following parameters can be set before running the algorithm:

  1. Episodes: The number of episodes (games/trials) to play and learn from.
  2. Start Epsilon: The initial probability of exploration. Range: 0 to 1.
  3. End Epsilon: The final probability of exploration. Range: 0 to 1.
  4. Discount Factor: The importance given to delayed rewards compared to immediate rewards. Range: 0 to 1.
  5. Speed/Delay: The delay (in ms) between the moves, lesser values mean faster games (set to lowest value when training).
  • The Train button starts training, Stop stops the game and Test shows how the agent plays without training the agent (useful to see how a trained agent plays).
  • The probability of exploration decreases linearly over the number of episodes given. So the agent moves randomly at the start and explores the state space and towards the end of the training phase (and during testing) it takes informed decisions based on the learned Q values for each state.

Installation

If you would like to tweak the algorithm locally:

  • Clone the repository.
  • Run npm -i install.
  • Run npm start.

Acknowledgements