GitHub - siriusctrl/COMP30024-AI: COMP30024 projects · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
Project1		Project1
Project2		Project2
.gitignore		.gitignore
README.md		README.md
test.json		test.json

Repository files navigation

This a project repo for game chexers which introduced by course COMP30024 Semester 1

prepare for part B

Alpha go related paper
Monte Carlo Tree search
- 550 line python MCTS for go
Basic reinforcement learning
- Markov Decison Process and Q-learning

Some idea

Use residual-CNN with MCST
- Problems:
  - Need the offline game play engine
  - Need quite long computational power
  - Don't know how to update the hyper-parameters by using the reinforcement learning policy
    - Solution:
      - ==Not find yet==
- advantages
  - No need of human knowledge
  - Make decision very quick
Min-max algorithms
- What we learnt on the class
- Problems
  - How to extend this to multi-agent
    - Using three tuples
      - But decision making process might become really long, so we need to make some cut down rules
  - slow decision making, since we only have 60s computational time overall.
DQN
- ==Haven't have a try yet==
TDleaf
- thesis about TDleaf
- TD learning, the easier way
Find Nash equilibrium
- Very hard as well
Evolution algorithm
- Randomly assign parameter to evaluation function and starting to play the game
- Each turn, collect the winner's data as the final result
- At the beginning of each turn, distributed 3 version of the data, each of them contains a slight change.
- We can assume that for a long enough time, we could find the best parameter for the evaluation function.
- Problem
  - Need really long time to learn the parameters, since there maybe a lot of parameters
  - How to determine what condition should be included in the evaluation function is a big problem.

TODO

Generate play log for later update our policy usage. The format is a tuple contains the following content (there are two helper function in VanGame.utils might help)
1. Current state tuple of (37) numbers
2. Environment reward when move from previous state to this state
3. Our predict V (based on our utility function)
4. Action (MOVE, JUMP or EXIT)
Try both linear and feed-forward model for the game

About

COMP30024 projects

Report repository

Releases

No releases published

Packages

Contributors

Languages