# Crear un TorchPlayer


Recibe el modelo a instanciar como path y juega con el mismo

- Pensar como resolver el problema de que solo samplee las válidas
- Agregarle la opción de monte carlo tree search (opcional) con las opciones de iterationLimit, timeLimit

Si va a agregar MCTS mirar la notebook 007_MCTS.ipnb

In [1]:
%load_ext autoreload
%autoreload 2

In [4]:
from players import BasePlayer
import numpy as np
from typing import Union, Optional, Dict, Any
from boardgame2 import BoardGameEnv
from stable_baselines3 import PPO

In [5]:
class TorchPlayer(BasePlayer):
    def __init__(self,
                 player: int = 1,
                 env: BoardGameEnv = None,
                 flatten_action: bool = False,
                 model_path: str = None,
                 deterministic: bool = True,
                 only_valid: bool = True,
                 mcts: bool = False,
                 iterationLimit: int = None,
                 timeLimit: int = None,
                 **custom_kwargs: Optional[Dict[str, Any]]  # Make subclass constructor generic
                 ):
        super().__init__(player, env, flatten_action)

        if model_path is None:
            raise Exception("model_path cannot be None")

        self.model = PPO.load(model_path)
        self.deterministic = deterministic
        self.only_valid = only_valid
        self.mcts = mcts
        self.iterationLimit = iterationLimit
        self.timeLimit = timeLimit

    def predict(self, board: np.ndarray) -> Union[int, np.ndarray]:
        obs = board if (self.player == 1) else -board
        if self.only_valid:
            obs = [obs, self.env.get_valid((obs, 1))]
        # The model expects a batch of observations.
        # Make a batch of 1 obs
        obs = [obs]
        action = self.model.predict(obs, deterministic=self.deterministic)[0]
        if self.flatten_action:
            return action
        else:
            return np.array([action // self.board_shape, action % self.board_shape])


# Arena

Testear el jugador contra los distintos jugadores

In [16]:
from arena import Arena
from players import RandomPlayer, GreedyPlayer, TorchPlayer
from boardgame2 import ReversiEnv

In [17]:
env = ReversiEnv(board_shape=8)

## Torch vs Random:

In [21]:
player_1 = TorchPlayer(player=1, env=env, model_path="./torch.zip")
player_2 = RandomPlayer(player=-1, env=env)
arena = Arena(player_1, player_2, env, verbose=True)

In [22]:
arena.play(n_games=100)


MATCH: TorchPlayer vs RandomPlayer

Playing n:100/100 Wins(player 1/ player 2):78.0%/19.0%%

AND THE WINNER IS... PLAYER 1!!, of type TorchPlayer



In [23]:
arena.print_players_stats()


####### STATS FOR PLAYER: 1 OF TYPE: TorchPlayer #######

Wins as first: 0.6888888888888889
Wins as second: 0.8727272727272727
Ties: 0.02
Plays as first: 45
Plays as second: 55
Avg game duration: 60.0

########################################################
            
        

####### STATS FOR PLAYER: 2 OF TYPE: RandomPlayer #######

Wins as first: 0.09090909090909091
Wins as second: 0.3111111111111111
Ties: 0.02
Plays as first: 55
Plays as second: 45
Avg game duration: 60.0

#########################################################
            
        


## Torch vs Greedy:

In [30]:
player_1 = TorchPlayer(player=1, env=env, model_path="./torch.zip")
player_2 = GreedyPlayer(player=-1, env=env)
arena = Arena(player_2, player_1, env, verbose=True)

In [31]:
arena.play(n_games=100)


MATCH: GreedyPlayer vs TorchPlayer

Playing n:100/100 Wins(player 1/ player 2):23.0%/70.0%%

AND THE WINNER IS... PLAYER 2!!, of type TorchPlayer



In [32]:
arena.print_players_stats()


####### STATS FOR PLAYER: 1 OF TYPE: GreedyPlayer #######

Wins as first: 0.24489795918367346
Wins as second: 0.21568627450980393
Ties: 0.06
Plays as first: 49
Plays as second: 51
Avg game duration: 59.1

#########################################################
            
        

####### STATS FOR PLAYER: 2 OF TYPE: TorchPlayer #######

Wins as first: 0.7450980392156863
Wins as second: 0.673469387755102
Ties: 0.06
Plays as first: 51
Plays as second: 49
Avg game duration: 59.1

########################################################
            
        


## Torch vs Torch:

In [27]:
player_1 = TorchPlayer(player=1, env=env, model_path="./torch.zip")
player_2 = TorchPlayer(player=-1, env=env, model_path="./torch.zip")
arena = Arena(player_1, player_2, env, verbose=True)

In [28]:
arena.play(n_games=100)


MATCH: TorchPlayer vs TorchPlayer

Playing n:100/100 Wins(player 1/ player 2):52.0%/47.0%%

AND THE WINNER IS... PLAYER 1!!, of type TorchPlayer



In [29]:
arena.print_players_stats()


####### STATS FOR PLAYER: 1 OF TYPE: TorchPlayer #######

Wins as first: 1.0
Wins as second: 0.0
Ties: 0.0
Plays as first: 52
Plays as second: 48
Avg game duration: 60.0

########################################################
            
        

####### STATS FOR PLAYER: 2 OF TYPE: TorchPlayer #######

Wins as first: 1.0
Wins as second: 0.0
Ties: 0.0
Plays as first: 48
Plays as second: 52
Avg game duration: 60.0

########################################################
            
        
