Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Nando - A Deep Q-Learning Agent for Tic-Tac-Toe

Nando is a pet project with the goal of building a deep Q-learning agent to learn how to play Tic-Tac-Toe. In a nutshell, Q-learning is a reinforcement learning technique that aims at inferring a reward function Q(s,a) (where s is a state and a a possible next action) by increasing experience. Having an accurate reward function Q, one can then build an agent that employs the optimal strategy in an environment (i.e. for a given state, the agent picks as next action the one that maximizes the future reward). Deep Q-learning thus consists of approximating Q by means of a neural network.

Additional info regarding Q-learning can be found in the following references:

Quick Start

Nando has two execution modes: train and play. In the former mode, a DQL agent is trained against another agent by playing a given number of Tic-Tac-Toe games. At the end of the training, the program outputs a plot depicting the variation of the reward obtained with the number of games played. In the latter mode, two agents simply play against each other.

The types of opponent agents currently supported are:

  • human, which represents a human player. This agent receives the next move from the console.
  • basic, which implements an agent that attempts to play obvious moves if possible or random ones otherwise.
  • random which implements an agent that makes random moves.
  • mrmiyagi, which implements an agent that follows a hardcoded optimal strategy.

Nando DQL agent is implemented with a neural network comprising a 9-neuron input layer followed by a 27-neuron hidden layer (sigmoid activation), a 9-neuron hidden layer (sigmoid activation), and a 9-neuron output layer (linear activation). The 9 neurons of the input and output layers refer to the nine cells of the tic-tac-toe board.

Nando is implemented in Java, using the Encog Machine Learning framework.


mvn package


1. Training mode:

java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar train [options]

Options include:

-i <path-input-file> load a neural network previously trained. (optional)

-o <path-output-file> save the neural network to be trained to a file. (optional)

-p <opponent> pick the type of agent against which the DQL agent's neural network will be trained. Types of agents include: human, basic, random, and mrmiyagi. (required)

-r <num-rounds> duration of the training in number of games played. (optional, default = 2500)

Example: java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar train -i -p basic -r 100 (loads a Nando DQL agent from file and trains it (i.e. updates its neural network) by playing 100 games against a basic agent)

2. Play mode:

java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar play agentX agentO -r <num-rounds>

Example: java -jar target/nando-dql-1.0-SNAPSHOT-jar-with-dependencies.jar play human -r 5 (allow a human to play 5 tic-tac-toe games against the Nando DQL agent stored in file


Nando - A Deep Q-Learning Agent for Tic-Tac-Toe






No releases published


No packages published