GitHub - hirako2000/tttow-reinforcement-learning: Libless Reinforcement learning on the browser to win tic tac toe

Reinforcement Learning Tic Tac Toe

Reinforcement Learning to beat Tic Tac Toe on the Browser
See live »

Report Bug :: Request Feature

OK, DOkey

Table of Contents

About
- Built With
Getting Started
- Prerequisites
- Repo
- Develop
- Build
- deploy
Codebase
Roadmap
Contributing
License
Acknowledgments

TL;DR

🛠 Installation & Set Up

Install and use the correct version of Node using nvm
```
nvm install && nvm use 18
```
Install dependencies
```
npm install
```
Start the development server
```
npm start
```

Research and practice outcome

Intense journey. Most was able to implement three different reinforcement learning agents. None of them build a model, none are supervised nor non supervised.
Q-learning agent builds a q-table mapping contex to good move to make. minmax and Monte Carlo determine the next good move on in real time, both build a tree of possible moves given the current state of the game explores until determining a 'good' move on the fly.

Dive into Reinforcement learning

After eading relevant material, and getting hands on. Built some reinforcement learning algorithm(s) to later on build one to play a MTG style card games. Coding the 3 agents:

Spent 1 week reading only
1 week coding and reading each day
Took a week break
Went back at it with a revenge to implement Monte Carlo

None of the 3 deliver an unbeatable Tic Tac Toe player, they should but I'm giving up for now. they aren't even particularily good, except minmax that has a decent play and will pull some draws and even win against a non perfect playing human

I could get it to work with weeks of effort. I will explore further instead:

Genetic algorithm
Trust Region Policy Optimization (TRPO)
Hindsight Experience Replay (HER), etc.

Lesson learnt and taking it further

There are more optimal reinforement learning methods to play and beat humans at MTG kind of card games. I've figured via this hands on practical application of minmax/q-learning, and even monte carlo may not cut it for a game like MGT

The complexity is far greater than for tic tac toe chess, or even Go. I will need to dig further and combined methods once I implement the MGT engine. Those 3 not only would not perform in such complex contexts having trillions of possibilities, but they would lead to a very predictable AI which would make the AI boring for such kind of game and easily beatable by players who can build custom decks against models trained with a de facto limited set of decks.

🚀 Building this and Running for Production

Generate a full static production build
```
npm run build
```

About

Reinforcement learning in the browser. Without libraries (for now).

Yes of course I could use Python. But does it run on the browser?

Do I know what I'm doing? Not quite.

MinMax algo Q-Learning algo

(back to top)

Built With

node.js - of course
typescript - because scripting is great but it's good to have types
parceljs - I would have gone with nothing to bundle, but it speed things up to just use a good bundler with Hot Module Reloading during dev.

Also using

eslint - to check TS isn't too wonky
prettier - that keeps code well formatted

Getting Started

Prerequisites

you need Git
and nodejs of course

Repo

$ git clone https://github.com/hirako2000/tttow-reinforcement-learning.git

Navigate to the repo root's folder then install dependencies

$ cd ./tttow-reinforcement-learning && npm install

Develop

$ npm run dev # or npm run start

Build

This command will build everything for production deployment:

$ npm run build

It generates the files for the entire page.

Deploy

To host the assets, the build placed them all into the public folder. this deploy script uploads them 'somewhere':

$ npm run deploy

you may want to tweak that script in package.json, search for deploy. There are free hosting services out there, e.g surge.sh, or Netlify.

(back to top)

Customize

There is some html and styles.css, but many node elements gets created via JavaScript
There is typically a trainner file. Game logic is kept in a separate file. And of course the index
Since training isn't instant, it's always processed via a web worker, to not hold the main thread
Messages are sent from the worker to get updates (tyically to refresh)

Codebase

Roadmap

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

fork the Project
create your Feature Branch (git checkout -b feature/some-feature)
commit your Changes (git commit -m 'Add some feature')
push to the Branch (git push origin feature/some-feature)
open a Pull Request

License

This work is licensed under the MIT license.

You may use and remix this content, but not for commercial use. Such as selling the templates and stuff like that.

If you too produce work and publish it out there, it's clearer to choose a license.

(back to top)

Acknowledgments

Richard S. Sutton and Andrew G. Barto for their great book: Reinforcement Learning An introduction

If you decide to re-use this repo, go ahead. No need to credit or link back to this repo/site. Although it would be much appreciated. Don't re-republish the UI and logic pretty much as is though, it is lame, and shameless. Tweak the look and feel, custom the training, make it better, make it your own. Make it so that I wouldn't come across your stuff and think that it is mine. So that nobody comes across your stuff and somehow finds out it's a louzy copy of someone else lacking added value and personalisation.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
images		images
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
worklog.md		worklog.md

License

hirako2000/tttow-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Tic Tac Toe

🛠 Installation & Set Up

Research and practice outcome

Dive into Reinforcement learning

Lesson learnt and taking it further

🚀 Building this and Running for Production

About

Built With

Also using

Getting Started

Prerequisites

Repo

Develop

Build

Deploy

Customize

Codebase

Roadmap

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages