notes on some pkmn papers

pkmn · Oct 18, 2022 · ae11f71 · ae11f71
1 parent f7f7e6c
commit ae11f71
Showing 1 changed file with 35 additions and 0 deletions.
diff --git a/docs/NOTES.md b/docs/NOTES.md
@@ -75,14 +75,49 @@
   - **Poke2Vec: Vector Embeddings of Pokemon**
 - **yuzeh/metagrok**
   - **A Self-Play Policy Optimization Approach to Battling Pokemon**
+    - self-play Reinforcement Learning (actor-critic two-headed neural network with > 1B params)
+    - reward of 1 for win and -1 for lose, but uses fractional auxiliary rewards to speed up
+      training
+    - uses Proximal Policy Optimization + Generalized Advantage Estimation
+    - ~4M self-play matches over 6 days = $91 on Google Compute Engine
+    - 99.5% win rate vs. purely random, 61.2% win rate vs. pmariglia, 1677 Glicko-1 in
+      `gen7randombattle` after thousands of battles
+    - able to acheive high win rates vs. pmariglia in restricted custom non-random format but
+      required retraining on the specific format (approach generalizes, but training doesn't)
+    - suggests the use of experimenting with LSTM to better model human memory
 - **pmariglia/showdown**
 - **davidstone/technical-machine**
   - **Technical Machine**
+    - Expectiminimax with depth 3 + transposition tables to avoid reevaluation of identical states
 - **Optimal Battle Strategy in Pokemon using Reinforcement Learning**
+  - model-free Reinforcement Learning (softmax exploration strategy with Q-Learning)
+  - handwritten deterministic Generation I similator
+  - epsilon-greedy strategy to begin with (10% chance of choosing random move)
+  - discretizes Pokémon into 10 buckets of 10% each to reduce state space
+  - only 4 action states? (no switching?)
+  - 5000 games training resulted in only 60% win rate vs. random opponent. After changing to
+    soft-max and performing 20K training games acheived 70% win rate vs. random in a format
+    approximating `gen1challengecup`
 - **hsahovic/poke-env**
 - **dramamine/leftovers-again**
 - **taylorhansen/pokemonshowdown-ai**
 - **blue-sky-sea/Pokemon-MCTS-AI-Master**
 - **rameshvarun/showdownbot**
   - **Percymon: A Pokemon Showdown Artificial Intelligence**
+    - Minimax with depth 2, Alpha-Beta pruning, Move Ordering (eg. super effective before not very
+      effective), ~5s to move (though 40s long tail), adjusted Technical Machine weights
+    - Generation VI w/ no team building
+    - mega-evolves greedily to prune state space as much as possible
+    - sequentializes game (player A maximizes, player B minimizes etc)
+    - assumes unrevealed Pokémon are non-existent for simplicity, and assumes 7 most common moves
+      (pruning to 4 once all are revealed)
+    - 1540±31 Glicko-1 in `gen6randombattle` after 134 battles (minor improvement over greedy
+      ranking heuristic algorithm)
 - **Implementation and Evaluation of Information Set Monte Carlo Tree Search for Pokemon**
+  - `gen6battlestadiumsingles` (6 choose 3, level 50, 1 minute turns)
+  - uses Pokémon Global Link usage statistics
+  - custom simulator, "node class" vs. "simulator class" (actual state, used to determine next
+    state), some sort of pruning (?)
+  - used UCB for all agents except DMCTS test-agent using EXP3
+  - ISMCTS wins only ~25% of the time against "cheating" (omniscient) MCTS test agents, but 57.5% of
+    the time against determinized MCTS