# Test poke-env api

You need to install poke-env api by
```{commandline}
$ pip install poke-env
```
The poke-env requirements does not include dataclasses module, so if error occurs importing poke_env, you need to install it manually.

```{commandline}
$ pip install dataclasses
```

Once that is complete, clone the pokemon-showdown implementation
```{commandline}
$ git clone https://github.com/hsahovic/Pokemon-Showdown.git
```


### Random Player

Create random players.
Before running the below code, you should be running your pokemon-showdown server on your localhost.

In [26]:
import asyncio

from poke_env.player.random_player import RandomPlayer
from poke_env.player.utils import cross_evaluate
from tabulate import tabulate

Create 3 RandomPlayer agents and battle with each other 20 times each.

In [None]:
players = [RandomPlayer(max_concurrent_battles=10, battle_format="gen4randombattle") for _ in range(3)]

In [None]:
cross_evaluation = await cross_evaluate(players, n_challenges=20)

table = [["-"] + [p.username for p in players]]

for p_1, results in cross_evaluation.items():
    table.append([p_1] + [cross_evaluation[p_1][p_2] for p_2 in results])

print(tabulate(table))

### Max-damage player (Heuristic)

In [6]:
from poke_env.player.player import Player
from poke_env.environment.battle import Battle

Create a max damage player that chooses a move with maximum damage

In [3]:
class MaxDamagePlayer(Player):
    def choose_move(self, battle: Battle) -> str:
        # If the player can attack, it will
        if battle.available_moves:
            # Finds the best move among available ones
            best_move = max(battle.available_moves, key=lambda move: move.base_power)
            return self.create_order(best_move)
        # If no attack is available, a random switch will be made
        else:
            return self.choose_random_move(battle)

Cross evaluate with RandomPlayer

In [None]:
random_player = RandomPlayer(battle_format="gen4randombattle")
max_damage_player = MaxDamagePlayer(battle_format="gen4randombattle")

In [None]:
await max_damage_player.battle_against(random_player, n_battles=100)

print(f"Max damage player won {max_damage_player.n_won_battles} out of 100 battles")

### OpenAI Gym Wrapper

State Space: base power / damage multiplier / pokemon left / opponent pokemon left

A 1d tensor of length 10

Action Space: Given by agent

In [4]:
from poke_env.player.env_player import Gen8EnvSinglePlayer
import numpy as np


class SimpleRLPlayer(Gen8EnvSinglePlayer):
    def embed_battle(self, battle):
        moves_base_power = -np.ones(4)
        moves_dmg_multiplier = np.ones(4)
        for i, move in enumerate(battle.available_moves):
            moves_base_power[i] = (
                move.base_power / 100
            )  # Normalize 0~1
            if move.type:
                moves_dmg_multiplier[i] = move.type.damage_multiplier(
                    battle.opponent_active_pokemon.type_1,
                    battle.opponent_active_pokemon.type_2,
                )

        remaining_mon_team = (
            len([mon for mon in battle.team.values() if mon.fainted]) / 6
        )
        remaining_mon_opponent = (
            len([mon for mon in battle.opponent_team.values() if mon.fainted]) / 6
        )

        # Final vector with 10 components
        return np.concatenate(
            [
                moves_base_power,
                moves_dmg_multiplier,
                [remaining_mon_team, remaining_mon_opponent],
            ]
        )

    def compute_reward(self, battle) -> float:
        return self.reward_computing_helper(
            battle, fainted_value=2, hp_value=1, victory_value=30
        )

Instantiate the player

In [11]:
env_player = SimpleRLPlayer(battle_format="gen8randombattle")

In [15]:
env_player.action_space

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]

Now, build a deep q network

In [22]:
from keras.layers import Dense, Flatten
from keras.models import Sequential

n_action = len(env_player.action_space)

model = Sequential()
model.add(Dense(128, activation="elu", input_shape=(1, 10,)))
model.add(Flatten())
model.add(Dense(64, activation="elu"))
model.add(Dense(n_action, activation="linear"))

Define DQN

In [23]:
from rl.agents.dqn import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from keras.optimizers import Adam

memory = SequentialMemory(limit=10000, window_length=1)

policy = LinearAnnealedPolicy(
    EpsGreedyQPolicy(),
    attr="eps",
    value_max=1.0,
    value_min=0.05,
    value_test=0,
    nb_steps=10000,
)

# Defining our DQN
dqn = DQNAgent(
    model=model,
    nb_actions=len(env_player.action_space),
    policy=policy,
    memory=memory,
    nb_steps_warmup=1000,
    gamma=0.5,
    target_model_update=1,
    delta_clip=0.01,
    enable_double_dqn=True,
)

dqn.compile(Adam(lr=0.00025), metrics=["mae"])

Set Hyperparameters

In [24]:
NB_TRAINING_STEPS = 10000
NB_EVALUATION_EPISODES = 100

Start training

In [None]:
# Shit it does not work in Ipython
def dqn_training(player, dqn, nb_steps):
    dqn.fit(player, nb_steps=nb_steps)
    # This call will finished eventual unfinshed battles before returning
    player.complete_current_battle()

opponent = RandomPlayer(battle_format="gen8randombattle")

# Training
env_player.play_against(
    env_algorithm=dqn_training,
    opponent=opponent,
    env_algorithm_kwargs={"dqn": dqn, "nb_steps": NB_TRAINING_STEPS},
)
model.save(f"model_{NB_TRAINING_STEPS}")

### Connecting to Showdown Official


In [4]:
from poke_env.player_configuration import PlayerConfiguration
from poke_env.server_configuration import ShowdownServerConfiguration

ID: GokemonRox

PW: gokemon

이라는 showdown 계정을 만들었다. 위 계정을 이용하여 접속 가능.

In [5]:
player = MaxDamagePlayer(
    player_configuration=PlayerConfiguration("GokemonRox", "gokemon"),
    server_configuration=ShowdownServerConfiguration
)




만약 이 agent 와 한국어로 배틀을 하고 싶다면, 사설 서버인 포다운을 이용해서 배틀을 한다.

[포다운 다이렉트 서버](https://play.podown.pro/?p)

위 주소로 접속해서 본인 아이디로 로그인을 하고
GokemonRox 계정에 도전을 한다. (별도의 세팅이 없을 시 8세대 랜덤배틀)

이후 아래 코드로 도전을 수락한다.

In [6]:
await player.accept_challenges(None, 1)