Skip to content

nunomachado/nando-dql

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Nando - A Deep Q-Learning Agent for Tic-Tac-Toe

Nando is a pet project with the goal of building a deep Q-learning agent to learn how to play Tic-Tac-Toe. In a nutshell, Q-learning is a reinforcement learning technique that aims at inferring a reward function Q(s,a) (where s is a state and a a possible next action) by increasing experience. Having an accurate reward function Q, one can then build an agent that employs the optimal strategy in an environment (i.e. for a given state, the agent picks as next action the one that maximizes the future reward). Deep Q-learning thus consists of approximating Q by means of a neural network.

Additional info regarding Q-learning can be found in the following references:

Quick Start

Nando has two execution modes: train and play. In the former mode, a DQL agent is trained against another agent by playing a given number of Tic-Tac-Toe games. At the end of the training, the program outputs a plot depicting the variation of the reward obtained with the number of games played. In the latter mode, two agents simply play against each other.

The types of opponent agents currently supported are:

  • human, which represents a human player. This agent receives the next move from the console.
  • basic, which implements an agent that attempts to play obvious moves if possible or random ones otherwise.
  • random which implements an agent that makes random moves.
  • mrmiyagi, which implements an agent that follows a hardcoded optimal strategy.

Nando DQL agent is implemented with a neural network comprising a 9-neuron input layer followed by a 27-neuron hidden layer (sigmoid activation), a 9-neuron hidden layer (sigmoid activation), and a 9-neuron output layer (linear activation). The 9 neurons of the input and output layers refer to the nine cells of the tic-tac-toe board.

Nando is implemented in Java, using the Encog Machine Learning framework.

Compile

mvn package

Usage

1. Training mode:

java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar train [options]

Options include:

-i <path-input-file> load a neural network previously trained. (optional)

-o <path-output-file> save the neural network to be trained to a file. (optional)

-p <opponent> pick the type of agent against which the DQL agent's neural network will be trained. Types of agents include: human, basic, random, and mrmiyagi. (required)

-r <num-rounds> duration of the training in number of games played. (optional, default = 2500)

Example: java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar train -i NandoTest.eg -p basic -r 100 (loads a Nando DQL agent from file NandoTest.eg and trains it (i.e. updates its neural network) by playing 100 games against a basic agent)

2. Play mode:

java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar play agentX agentO -r <num-rounds>

Example: java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar play NandoTest.eg human -r 5 (allow a human to play 5 tic-tac-toe games against the Nando DQL agent stored in file NandoTest.eg)

About

Nando - A Deep Q-Learning Agent for Tic-Tac-Toe

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages