Skip to content
A reinforcement learning algorithm with a touch of functional programming
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Tic Tac Toe

Sample CLI

This repository contains a reinforcement learning algorithm wherein an agent is crafted with several default rewards (e.g. win, lose, and tie). It is trained by playing itself and then allowing the user to interact using a feature-rich CLI.

This 2-day project was inspired by the classic work by Sutton and Barton (2012) particularly the self-play discussion in exercise 1.1. I also referenced @tansey's repository for clarifying some details of a particular implementation.


Assuming a package manager such as pip, the following dependencies are needed to run the CLI:

pip install urwid urwid_timed_progress tabulate funcy phi pyrsistent numpy

To run, simply call python tic-tac-toe from within the directory.

Demo Recording

Diagram Flow

In case you're a lover of flow charts and sequence diagrams, here's a quick sketch of the program flow from the top-level.


  1. Why does the "AI Brain" section seem to show the initial values?

Because the values are very close the original values, they appear to be the same. However, if you look at the actual values, they are different. This distinction is important for the AI as it currently chooses the highest reward for

  1. Where can I get the values that the algorithm produces?

When you run the project, a file is produced called rewards.tsv that includes the weights for the different scenarios encountered during the training process for the agent playing as the X mark.


Sutton, R. S., & Barto, A. G. (2012). Reinforcement learning: an introduction (2nd ed.). Cambridge, MA: The MIT Press.

You can’t perform that action at this time.