Skip to content

Commit

Permalink
notes on some pkmn papers
Browse files Browse the repository at this point in the history
  • Loading branch information
scheibo committed Oct 18, 2022
1 parent f7f7e6c commit ae11f71
Showing 1 changed file with 35 additions and 0 deletions.
35 changes: 35 additions & 0 deletions docs/NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,49 @@
- **Poke2Vec: Vector Embeddings of Pokemon**
- **yuzeh/metagrok**
- **A Self-Play Policy Optimization Approach to Battling Pokemon**
- self-play Reinforcement Learning (actor-critic two-headed neural network with > 1B params)
- reward of 1 for win and -1 for lose, but uses fractional auxiliary rewards to speed up
training
- uses Proximal Policy Optimization + Generalized Advantage Estimation
- ~4M self-play matches over 6 days = $91 on Google Compute Engine
- 99.5% win rate vs. purely random, 61.2% win rate vs. pmariglia, 1677 Glicko-1 in
`gen7randombattle` after thousands of battles
- able to acheive high win rates vs. pmariglia in restricted custom non-random format but
required retraining on the specific format (approach generalizes, but training doesn't)
- suggests the use of experimenting with LSTM to better model human memory
- **pmariglia/showdown**
- **davidstone/technical-machine**
- **Technical Machine**
- Expectiminimax with depth 3 + transposition tables to avoid reevaluation of identical states
- **Optimal Battle Strategy in Pokemon using Reinforcement Learning**
- model-free Reinforcement Learning (softmax exploration strategy with Q-Learning)
- handwritten deterministic Generation I similator
- epsilon-greedy strategy to begin with (10% chance of choosing random move)
- discretizes Pokémon into 10 buckets of 10% each to reduce state space
- only 4 action states? (no switching?)
- 5000 games training resulted in only 60% win rate vs. random opponent. After changing to
soft-max and performing 20K training games acheived 70% win rate vs. random in a format
approximating `gen1challengecup`
- **hsahovic/poke-env**
- **dramamine/leftovers-again**
- **taylorhansen/pokemonshowdown-ai**
- **blue-sky-sea/Pokemon-MCTS-AI-Master**
- **rameshvarun/showdownbot**
- **Percymon: A Pokemon Showdown Artificial Intelligence**
- Minimax with depth 2, Alpha-Beta pruning, Move Ordering (eg. super effective before not very
effective), ~5s to move (though 40s long tail), adjusted Technical Machine weights
- Generation VI w/ no team building
- mega-evolves greedily to prune state space as much as possible
- sequentializes game (player A maximizes, player B minimizes etc)
- assumes unrevealed Pokémon are non-existent for simplicity, and assumes 7 most common moves
(pruning to 4 once all are revealed)
- 1540±31 Glicko-1 in `gen6randombattle` after 134 battles (minor improvement over greedy
ranking heuristic algorithm)
- **Implementation and Evaluation of Information Set Monte Carlo Tree Search for Pokemon**
- `gen6battlestadiumsingles` (6 choose 3, level 50, 1 minute turns)
- uses Pokémon Global Link usage statistics
- custom simulator, "node class" vs. "simulator class" (actual state, used to determine next
state), some sort of pruning (?)
- used UCB for all agents except DMCTS test-agent using EXP3
- ISMCTS wins only ~25% of the time against "cheating" (omniscient) MCTS test agents, but 57.5% of
the time against determinized MCTS

0 comments on commit ae11f71

Please sign in to comment.