Reinforcement Learning to beat Tic Tac Toe on the Browser
See live ยป
Report Bug
::
Request Feature
OK, DOkey
Table of Contents
TL;DR
-
Install and use the correct version of Node using nvm
nvm install && nvm use 18
-
Install dependencies
npm install
-
Start the development server
npm start
Intense journey. Most was able to implement three different reinforcement learning agents. None of them build a model, none are supervised nor non supervised.
Q-learning agent builds a q-table mapping contex to good move to make.
minmax and Monte Carlo determine the next good move on in real time, both build a tree of possible moves given the current state of the game explores until determining a 'good' move on the fly.
After eading relevant material, and getting hands on. Built some reinforcement learning algorithm(s) to later on build one to play a MTG style card games. Coding the 3 agents:
- Spent 1 week reading only
- 1 week coding and reading each day
- Took a week break
- Went back at it with a revenge to implement Monte Carlo
None of the 3 deliver an unbeatable Tic Tac Toe player, they should but I'm giving up for now. they aren't even particularily good, except minmax that has a decent play and will pull some draws and even win against a non perfect playing human
I could get it to work with weeks of effort. I will explore further instead:
- Genetic algorithm
- Trust Region Policy Optimization (TRPO)
- Hindsight Experience Replay (HER), etc.
There are more optimal reinforement learning methods to play and beat humans at MTG kind of card games. I've figured via this hands on practical application of minmax/q-learning, and even monte carlo may not cut it for a game like MGT
The complexity is far greater than for tic tac toe chess, or even Go. I will need to dig further and combined methods once I implement the MGT engine. Those 3 not only would not perform in such complex contexts having trillions of possibilities, but they would lead to a very predictable AI which would make the AI boring for such kind of game and easily beatable by players who can build custom decks against models trained with a de facto limited set of decks.
-
Generate a full static production build
npm run build
Reinforcement learning in the browser. Without libraries (for now).
Yes of course I could use Python. But does it run on the browser?
Do I know what I'm doing? Not quite.
- node.js - of course
- typescript - because scripting is great but it's good to have types
- parceljs - I would have gone with nothing to bundle, but it speed things up to just use a good bundler with Hot Module Reloading during dev.
$ git clone https://github.com/hirako2000/tttow-reinforcement-learning.git
Navigate to the repo root's folder then install dependencies
$ cd ./tttow-reinforcement-learning && npm install
$ npm run dev # or npm run start
This command will build everything for production deployment:
$ npm run build
It generates the files for the entire page.
To host the assets, the build placed them all into the public
folder. this deploy script uploads them 'somewhere':
$ npm run deploy
you may want to tweak that script in package.json, search for deploy. There are free hosting services out there, e.g surge.sh, or Netlify.
- There is some html and styles.css, but many node elements gets created via JavaScript
- There is typically a trainner file. Game logic is kept in a separate file. And of course the index
- Since training isn't instant, it's always processed via a web worker, to not hold the main thread
- Messages are sent from the worker to get updates (tyically to refresh)
- Tic Tac Toe game logic
- Human vs Human
- Minmax training
- Human vs AI
- Unbeatable minmax model
- Unbeatable Q-learning
- Render training metadata
- Render AI decision making
- better layout
- Make q learning actually work!
- Monte Carlo
- Genetic algorithm
- superb layout
- amazing layout
- Measure AI performance
- more to come I guess
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- fork the Project
- create your Feature Branch (
git checkout -b feature/some-feature
) - commit your Changes (
git commit -m 'Add some feature'
) - push to the Branch (
git push origin feature/some-feature
) - open a Pull Request
This work is licensed under the MIT license.
You may use and remix this content, but not for commercial use. Such as selling the templates and stuff like that.
If you too produce work and publish it out there, it's clearer to choose a license.
Richard S. Sutton and Andrew G. Barto for their great book: Reinforcement Learning An introduction
If you decide to re-use this repo, go ahead. No need to credit or link back to this repo/site. Although it would be much appreciated. Don't re-republish the UI and logic pretty much as is though, it is lame, and shameless. Tweak the look and feel, custom the training, make it better, make it your own. Make it so that I wouldn't come across your stuff and think that it is mine. So that nobody comes across your stuff and somehow finds out it's a louzy copy of someone else lacking added value and personalisation.