Skip to content

Libless Reinforcement learning on the browser to win tic tac toe

License

Notifications You must be signed in to change notification settings

hirako2000/tttow-reinforcement-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

36 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Contributors Forks Stargazers Issues repo-size license


Reinforcement Learning Tic Tac Toe

screenshot

Reinforcement Learning to beat Tic Tac Toe on the Browser
See live ยป

Report Bug :: Request Feature

OK, DOkey

Table of Contents
  1. About
  2. Getting Started
  3. Codebase
  4. Roadmap
  5. Contributing
  6. License
  7. Acknowledgments

TL;DR

๐Ÿ›  Installation & Set Up

  1. Install and use the correct version of Node using nvm

    nvm install && nvm use 18
  2. Install dependencies

    npm install
  3. Start the development server

    npm start

Research and practice outcome

Intense journey. Most was able to implement three different reinforcement learning agents. None of them build a model, none are supervised nor non supervised.
Q-learning agent builds a q-table mapping contex to good move to make. minmax and Monte Carlo determine the next good move on in real time, both build a tree of possible moves given the current state of the game explores until determining a 'good' move on the fly.

Dive into Reinforcement learning

After eading relevant material, and getting hands on. Built some reinforcement learning algorithm(s) to later on build one to play a MTG style card games. Coding the 3 agents:

  • Spent 1 week reading only
  • 1 week coding and reading each day
  • Took a week break
  • Went back at it with a revenge to implement Monte Carlo

None of the 3 deliver an unbeatable Tic Tac Toe player, they should but I'm giving up for now. they aren't even particularily good, except minmax that has a decent play and will pull some draws and even win against a non perfect playing human

I could get it to work with weeks of effort. I will explore further instead:

  • Genetic algorithm
  • Trust Region Policy Optimization (TRPO)
  • Hindsight Experience Replay (HER), etc.

Lesson learnt and taking it further

There are more optimal reinforement learning methods to play and beat humans at MTG kind of card games. I've figured via this hands on practical application of minmax/q-learning, and even monte carlo may not cut it for a game like MGT

The complexity is far greater than for tic tac toe chess, or even Go. I will need to dig further and combined methods once I implement the MGT engine. Those 3 not only would not perform in such complex contexts having trillions of possibilities, but they would lead to a very predictable AI which would make the AI boring for such kind of game and easily beatable by players who can build custom decks against models trained with a de facto limited set of decks.

๐Ÿš€ Building this and Running for Production

  1. Generate a full static production build

    npm run build

About

Reinforcement learning in the browser. Without libraries (for now).

Yes of course I could use Python. But does it run on the browser?

Do I know what I'm doing? Not quite.

MinMax algo Q-Learning algo

(back to top)

Built With

  • node.js - of course
  • typescript - because scripting is great but it's good to have types
  • parceljs - I would have gone with nothing to bundle, but it speed things up to just use a good bundler with Hot Module Reloading during dev.

Also using

  • eslint - to check TS isn't too wonky
  • prettier - that keeps code well formatted

Getting Started

Prerequisites

Repo

$ git clone https://github.com/hirako2000/tttow-reinforcement-learning.git

Navigate to the repo root's folder then install dependencies

$ cd ./tttow-reinforcement-learning && npm install

Develop

$ npm run dev # or npm run start

Build

This command will build everything for production deployment:

$ npm run build

It generates the files for the entire page.

Deploy

To host the assets, the build placed them all into the public folder. this deploy script uploads them 'somewhere':

$ npm run deploy

you may want to tweak that script in package.json, search for deploy. There are free hosting services out there, e.g surge.sh, or Netlify.

(back to top)

Customize

  • There is some html and styles.css, but many node elements gets created via JavaScript
  • There is typically a trainner file. Game logic is kept in a separate file. And of course the index
  • Since training isn't instant, it's always processed via a web worker, to not hold the main thread
  • Messages are sent from the worker to get updates (tyically to refresh)

Codebase

Visualization of the codebase

Roadmap

  • Tic Tac Toe game logic
  • Human vs Human
  • Minmax training
  • Human vs AI
  • Unbeatable minmax model
  • Unbeatable Q-learning
  • Render training metadata
  • Render AI decision making
  • better layout
  • Make q learning actually work!
  • Monte Carlo
  • Genetic algorithm
  • superb layout
  • amazing layout
  • Measure AI performance
  • more to come I guess

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. fork the Project
  2. create your Feature Branch (git checkout -b feature/some-feature)
  3. commit your Changes (git commit -m 'Add some feature')
  4. push to the Branch (git push origin feature/some-feature)
  5. open a Pull Request

License

license

This work is licensed under the MIT license.

You may use and remix this content, but not for commercial use. Such as selling the templates and stuff like that.

If you too produce work and publish it out there, it's clearer to choose a license.

(back to top)


Acknowledgments

Richard S. Sutton and Andrew G. Barto for their great book: Reinforcement Learning An introduction

If you decide to re-use this repo, go ahead. No need to credit or link back to this repo/site. Although it would be much appreciated. Don't re-republish the UI and logic pretty much as is though, it is lame, and shameless. Tweak the look and feel, custom the training, make it better, make it your own. Make it so that I wouldn't come across your stuff and think that it is mine. So that nobody comes across your stuff and somehow finds out it's a louzy copy of someone else lacking added value and personalisation.

About

Libless Reinforcement learning on the browser to win tic tac toe

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published