Skip to content
Temporal difference learning for ultimate tic-tac-toe.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Temporal difference learning for ultimate tic-tac-toe.

What is ultimate tic-tac-toe?

It's like tic-tac-toe, but each square of the game contains another game of tic-tac-toe in it! Win small games to claim the squares in the big game. Simple, right? But there is a catch: Whichever small square you pick is the next big square your opponent must play in. Read more...

ultimate tic-tac-toe gif

What is temporal difference learning?

Temporal difference (TD) learning is a reinforcement learning algorithm trained only using self-play. The algorithm learns by bootstrapping from the current estimate of the value function, i.e. the value of a state is updated based on the current estimate of the value of future states. Read more...

How to use


To begin training:


or set the learning hyperparameters using any of the optional arguments:

python --lr LEARN_RATE --a ALPHA --e EPSILON


You can play against a trained model using

python --params path/to/parameters.params

If no parameters are provided, the opponent will make moves randomly.


Coming soon.




You can’t perform that action at this time.