Reinforcement learning on Puzzle & Dragons.
The project is still evolving but currently we are trying:
- Boltzmann policy and annealing.
- Experience replay.
- A frozen model B for action reward prediction.
Feb 2nd, 2019: Our agent now makes one combo then passes.