This a project repo for game chexers which introduced by course COMP30024 Semester 1
-
Alpha go related paper
-
Monte Carlo Tree search
-
Basic reinforcement learning
- Use residual-CNN with MCST
- Problems:
- Need the offline game play engine
- Need quite long computational power
- Don't know how to update the hyper-parameters by using the reinforcement learning policy
- Solution:
- ==Not find yet==
- Solution:
- advantages
- No need of human knowledge
- Make decision very quick
- Problems:
- Min-max algorithms
- What we learnt on the class
- Problems
- How to extend this to multi-agent
- Using three tuples
- But decision making process might become really long, so we need to make some cut down rules
- Using three tuples
- slow decision making, since we only have 60s computational time overall.
- How to extend this to multi-agent
- DQN
- ==Haven't have a try yet==
- TDleaf
- Find Nash equilibrium
- Very hard as well
- Evolution algorithm
- Randomly assign parameter to evaluation function and starting to play the game
- Each turn, collect the winner's data as the final result
- At the beginning of each turn, distributed 3 version of the data, each of them contains a slight change.
- We can assume that for a long enough time, we could find the best parameter for the evaluation function.
- Problem
- Need really long time to learn the parameters, since there maybe a lot of parameters
- How to determine what condition should be included in the evaluation function is a big problem.
- Generate play log for later update our policy usage. The format is a tuple contains the following content (there are two helper function in VanGame.utils might help)
- Current state tuple of (37) numbers
- Environment reward when move from previous state to this state
- Our predict V (based on our utility function)
- Action (MOVE, JUMP or EXIT)
- Try both linear and feed-forward model for the game