# MLP based DQN agent against fixed oponent

In the previous notebook, `7-cnn-dqn-fixed-oponent.ipynb`, we used the CNN based model for training through an iteration of alternating frozen agents.
We found this to give interesting but not fully statisfactory results.
We will now use the same technique for the custom MLP based approach designed in `5-improving-dqn-architecture.ipynb` to properly compare both architectures performance for the agents.

<hr><hr>

## Table of Contents

- Contact information
- Checking requirements
  - Correct Anaconda environment
  - Correct module access
  - Correct CUDA access
- Training two DQN agents on connect four Gym
  - Building the environment
  - Implementing the DQN policy
  - Building agents
  - Function for letting agents learn
  - Function for watching learned agent
  - Doing the experiment
- Discussion

<hr><hr>

## Contact information

| Name             | Student ID | VUB mail                                                  | Personal mail                                               |
| ---------------- | ---------- | --------------------------------------------------------- | ----------------------------------------------------------- |
| Lennert Bontinck | 0568702    | [lennert.bontinck@vub.be](mailto:lennert.bontinck@vub.be) | [info@lennertbontinck.com](mailto:info@lennertbontinck.com) |



<hr><hr>

## Checking requirements

### Correct Anaconda environment

The `rl-project` anaconda environment should be active to ensure proper support. Installation instructions are available on [the GitHub repository of the RL course project and homeworks](https://github.com/pikawika/vub-rl).

In [1]:
####################################################
# CHECKING FOR RIGHT ANACONDA ENVIRONMENT
####################################################

import os
from platform import python_version

print(f"Active environment: {os.environ['CONDA_DEFAULT_ENV']}")
print(f"Correct environment: {os.environ['CONDA_DEFAULT_ENV'] == 'rl-project'}")
print(f"\nPython version: {python_version()}")
print(f"Correct Python version: {python_version() == '3.8.10'}")

Active environment: rl-project
Correct environment: True

Python version: 3.8.10
Correct Python version: True


<hr>

### Correct module access

The following code block will load in all required modules and show if the versions match those that are recommended.

In [3]:
####################################################
# LOADING MODULES
####################################################

# Allow reloading of libraries
import importlib

# Plotting
import matplotlib; print(f"Matplotlib version (3.5.1 recommended): {matplotlib.__version__}")
import matplotlib.pyplot as plt

# Argparser
import argparse

# More data types
import typing
import numpy as np

# Pygame
import pygame; print(f"Pygame version (2.1.2 recommended): {pygame.__version__}")

# Gym environment
import gym; print(f"Gym version (0.21.0 recommended): {gym.__version__}")

# Tianshou for RL algorithms
import tianshou as ts; print(f"Tianshou version (0.4.8 recommended): {ts.__version__}")

# Torch is a popular DL framework
import torch; print(f"Torch version (1.12.0 recommended): {torch.__version__}")

# PPrint is a pretty print for variables
from pprint import pprint

# Our custom connect four gym environment
import sys
sys.path.append('../')
import gym_connect4_pygame.envs.ConnectFourPygameEnvV2 as cfgym
importlib.invalidate_caches()
importlib.reload(cfgym)

# Time for allowing "freezes" in execution
import time;

# Allow for copying objects in a non reference manner
import copy

# Used for updating notebook display
from IPython.display import clear_output

Matplotlib version (3.5.1 recommended): 3.5.1
Pygame version (2.1.2 recommended): 2.1.2
Gym version (0.21.0 recommended): 0.21.0
Tianshou version (0.4.8 recommended): 0.4.8
Torch version (1.12.0 recommended): 1.12.0.dev20220520+cu116


<hr>

### Correct CUDA access

The installation instructions specify how to install PyTorch with CUDA 11.6.
The following code block tests if this was done successfully.

In [4]:
####################################################
# CUDA VALIDATION
####################################################

# Check cuda available
print(f"CUDA is available: {torch.cuda.is_available()}")

# Show cuda devices
print(f"\nAmount of connected devices supporting CUDA: {torch.cuda.device_count()}")

# Show current cuda device
print(f"\nCurrent CUDA device: {torch.cuda.current_device()}")

# Show cuda device name
print(f"Cuda device 0 name: {torch.cuda.get_device_name(0)}")

CUDA is available: True

Amount of connected devices supporting CUDA: 1

Current CUDA device: 0
Cuda device 0 name: NVIDIA GeForce GTX 970


<hr><hr>

## Training two DQN agents on connect four Gym

Our connect four gym setup requires two agents, one for each player.
To reduce complexity, agents will always play as the same player, e.g. always as player 1.
It is important to note that connect four is a *solved game*.
According to [The Washington Post](https://www.washingtonpost.com/news/wonk/wp/2015/05/08/how-to-win-any-popular-game-according-to-data-scientists/):

> Connect Four is what mathematicians call a "solved game," meaning you can play it perfectly every time, no matter what your opponent does. You will need to get the first move, but as long as you do so, you can always win within 41 moves.

<hr>

### Building the environment

This code is taken from previous notebooks.
We don't allow invalid moves to make the problem easier for now.

In [5]:
####################################################
# CONNECT FOUR V2 ENVIRONMENT
####################################################

def get_env():
    """
    Returns the connect four gym environment V2 altered for Tianshou and Petting Zoo compatibility.
    Already wrapped with a ts.env.PettingZooEnv wrapper.
    """
    return ts.env.PettingZooEnv(cfgym.env(reward_move= 0, # Set to 1 for reward to make moves (incentivise longer games)
                                          reward_invalid= -3,
                                          reward_draw= 100,
                                          reward_win= 25,
                                          reward_loss= -25,
                                          allow_invalid_move= False))
    
    
# Test the environment
env = get_env()
print(f"Observation space: {env.observation_space}")
print(f"\nAction space: {env.action_space}")

# Reset the environment to start from a clean state, returns the initial observation
observation = env.reset()

print("\n Initial player id:")
print(observation["agent_id"])

print("\n Initial observation:")
print(observation["obs"])

print("\n Initial mask:")
print(observation["mask"])

# Clean unused variables
del observation
del env

Observation space: Dict(action_mask:Box([0 0 0 0 0 0 0], [1 1 1 1 1 1 1], (7,), int8), observation:Box([[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]], [[2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]], (6, 7), int8))

Action space: Discrete(7)

 Initial player id:
player_1

 Initial observation:
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]

 Initial mask:
[True, True, True, True, True, True, True]


<hr>

### Implementing the DQN policy

We use the strategy created in `5-improving-dqn-architecture.ipynb`.

In [6]:
####################################################
# DQN ARCHITECTURE
####################################################

class CustomDQN(torch.nn.Module):
    """
    Custom DQN using a model based on CNN
    """
    def __init__(self,
                 state_shape: typing.Sequence[int],
                 action_shape: typing.Sequence[int],
                 device: typing.Union[str, int, torch.device] = 'cuda' if torch.cuda.is_available() else 'cpu',):
        # Parent call
        super().__init__()
        
        # Save device (e.g. cuda)
        self.device = device
        
        self.model = torch.nn.Sequential(
            torch.nn.Linear(np.prod(state_shape), 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, np.prod(action_shape)),
        )

    def forward(self, obs, state=None, info={}):
        if not isinstance(obs, torch.Tensor):
            obs = torch.tensor(obs, dtype=torch.float, device=self.device)
        batch = obs.shape[0]
        logits = self.model(obs.view(batch, -1))
        return logits, state


In [7]:
####################################################
# DQN POLICY
####################################################

def cf_custom_dqn_policy(state_shape: tuple,
                         action_shape: tuple,
                         optim: typing.Optional[torch.optim.Optimizer] = None,
                         learning_rate: float =  0.0001,
                         gamma: float = 0.9, # Smaller gamma favours "faster" win
                         n_step: int = 4, # Number of steps to look ahead
                         frozen: bool = False,
                         target_update_freq: int = 320):
    # Use cuda device if possible
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # Network to be used for DQN
    net = CustomDQN(state_shape, action_shape, device= device).to(device)
    
    # Default optimizer is an adam optimizer with the argparser learning rate
    if optim is None:
        optim = torch.optim.Adam(net.parameters(), lr= learning_rate)
        
    # If we are frozen, we use an optimizer that has learning rate 0
    if frozen:
        optim = torch.optim.SGD(net.parameters(), lr= 0)
        
        
    # Our agent DQN policy
    return ts.policy.DQNPolicy(model= net,
                               optim= optim,
                               discount_factor= gamma,
                               estimation_step= n_step,
                               target_update_freq= target_update_freq)

<hr>

### Building agents

This is identical to the previous notebook with the added option of "freezing" an agent which corresponds to giving it an optimizer with learning rate 0.

In [8]:
####################################################
# AGENT CREATION
####################################################

def get_agents(agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
               agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
               optim: typing.Optional[torch.optim.Optimizer] = None,
               resume_path_player_1: str = '', # Path to file to resume agent training from
               resume_path_player_2: str = '', 
               agent_player1_frozen: bool = False, # Freeze a player -> don't let it learn further
               agent_player2_frozen: bool = False,
               ) -> typing.Tuple[ts.policy.BasePolicy, torch.optim.Optimizer, list]:
    """
    Gets a multi agent policy manager, optimizer and player ids for the connect four V2 gym environment.
    Per default this returns 
        - Multi agent manager for 2 agents using DQN
        - Adam optimizer
        - ['player_1', 'player_2'] from the connect four environment
    """
    
    # Get the environment to play in (Connect four gym V2)
    env = get_env()
    
    # Get the observation space from the environment, depending on typo of space (ternary operator)
    observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
    
    # Set the arguments
    state_shape = observation_space.shape or observation_space.n
    action_shape = env.action_space.shape or env.action_space.n
    
    # Configure agent player 1 to be a DQN if no policy is passed.
    if agent_player1 is None:
        # Our agent1 uses a DQN policy
        agent_player1 = cf_custom_dqn_policy(state_shape= state_shape,
                                             action_shape= action_shape,
                                             optim= optim,
                                             frozen= agent_player1_frozen)
                
        # If we resume our agent we need to load the previous config
        if resume_path_player_1:
            agent_player1.load_state_dict(torch.load(resume_path_player_1))
            
    
    # Configure agent player 2 to be a DQN if no policy is passed.
    if agent_player2 is None:
        # Our agent1 uses a DQN policy
        agent_player2 = cf_custom_dqn_policy(state_shape= state_shape,
                                             action_shape= action_shape,
                                             optim= optim,
                                             frozen= agent_player2_frozen)
        
                
        # If we resume our agent we need to load the previous config
        if resume_path_player_2:
            agent_player2.load_state_dict(torch.load(resume_path_player_2))

    # Both our agents are DQN agents by default
    agents = [agent_player1, agent_player2]
        
    # Our policy depends on the order of the agents
    policy = ts.policy.MultiAgentPolicyManager(agents, env)
    
    # Return our policy, optimizer and the available agents in the environment
    # Per default: 
    #   - Multi agent manager for 2 agents using DQN
    #   - Adam optimizer
    #   - ['player_1', 'player_2'] from the connect four environment
    
    return policy, optim, env.agents

<hr>

### Function for letting agents learn

This is identical to the previous notebook.

In [9]:
####################################################
# AGENT TRAINING
####################################################

def train_agent(filename: str = "dqn_vs_dqn_cnn_based",
                agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
                agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
                agent_player1_frozen: bool = False, # Freeze a player -> don't let it learn further
                agent_player2_frozen: bool = False,
                single_agent_score_as_reward: bool= False, # Uses non frozen agent's score as reward
                optim: typing.Optional[torch.optim.Optimizer] = None,
                training_env_num: int = 1,
                testing_env_num: int = 1,
                buffer_size: int = 2^14,
                batch_size: int = 1, 
                epochs: int = 50, #50
                step_per_epoch: int = 1024, #1024
                step_per_collect: int = 64, # transition before update
                update_per_step: float = 0.1,
                testing_eps: float = 0.05,
                training_eps: float = 0.1,
                ) -> typing.Tuple[dict, ts.policy.BasePolicy]:
    """
    Trains two agents in the connect four V2 environment and saves their best model and logs.
    Returns:
        - result from offpolicy_trainer
        - final version of agent 1
        - final version of agent 2
    """

    # ======== notebook specific =========
    notebook_version = '8' # Used for foldering logs and models

    # ======== environment setup =========
    train_envs = ts.env.DummyVectorEnv([get_env for _ in range(training_env_num)])
    test_envs = ts.env.DummyVectorEnv([get_env for _ in range(testing_env_num)])
    
    # set the seed for reproducibility
    np.random.seed(1998)
    torch.manual_seed(1998)
    train_envs.seed(1998)
    test_envs.seed(1998)

    # ======== agent setup =========
    # Gets our agents from the previously made function
    # Per default: 
    #   - Multi agent manager for 2 agents using DQN
    #   - Adam optimizer
    #   - ['player_1', 'player_2'] from the connect four environment
    policy, optim, agents = get_agents(agent_player1=agent_player1,
                                       agent_player2=agent_player2,
                                       agent_player1_frozen= agent_player1_frozen,
                                       agent_player2_frozen= agent_player2_frozen,
                                       optim=optim)

    # ======== collector setup =========
    # Make a collector for the training environments
    train_collector = ts.data.Collector(policy= policy,
                                        env= train_envs,
                                        buffer= ts.data.VectorReplayBuffer(buffer_size, len(train_envs)),
                                        exploration_noise= True)
    
    # Make a collector for the testing environments
    test_collector = ts.data.Collector(policy= policy,
                                       env= test_envs,
                                       buffer= ts.data.VectorReplayBuffer(buffer_size, len(test_envs)),
                                       exploration_noise= True)
    
    # Uncomment below if you want to set epsilon in epsilon policy
    # policy.set_eps(1)
    
    # Collect data fot the training evnironments
    train_collector.collect(n_step= batch_size * training_env_num)
    
    # ======== ensure folders exist =========
    if not os.path.exists(os.path.join('./logs', 'paper_notebooks', notebook_version, filename)):
        os.makedirs(os.path.join('./logs', 'paper_notebooks', notebook_version, filename))
    if not os.path.exists(os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename)):
        os.makedirs(os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename))

    # ======== tensorboard logging setup =========
    # Allows to save the training progress to tensorboard compatable logs
    log_path = os.path.join('./logs', 'paper_notebooks', notebook_version, filename)
    writer = torch.utils.tensorboard.SummaryWriter(log_path)
    logger = ts.utils.TensorboardLogger(writer)

    # ======== callback functions used during training =========
    # We want to save our best policy
    def save_best_fn(policy):
        """
        Callback to save the best model
        """
        # Save best agent 1
        model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'best_policy_agent1.pth')
        torch.save(policy.policies[agents[0]].state_dict(), model_save_path)
        
        # Save best agent 2
        model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'best_policy_agent2.pth')
        torch.save(policy.policies[agents[1]].state_dict(), model_save_path)
        
        # Save agent2

    def stop_fn(mean_rewards):
        """
        Callback to stop training when we've reached the win rate
        """
        return mean_rewards >= 7 # (win = 10, 70% win without invalid moves = mean of 7)

    def train_fn(epoch, env_step):
        """
        Callback before training
        """        
        # Before training we want to configure the epsilon for the agents
        # In general more exploratory than the test case
        policy.policies[agents[0]].set_eps(training_eps)
        policy.policies[agents[1]].set_eps(training_eps)

    def test_fn(epoch, env_step):
        """
        Callback beore testing
        """        
        # Before testing we want to configure the epsilon for the agents
        # In general more greedy than the train case but not
        #   to avoid getting stuck on invalid moves
        policy.policies[agents[0]].set_eps(testing_eps)
        policy.policies[agents[1]].set_eps(testing_eps)

    def reward_metric(rews):
        """
        Callback for reward collection
        """        
        if agent_player2_frozen and single_agent_score_as_reward:
            # agent 2 frozen, optimizing for agent 1
            return rews[:, 0]
        
        if agent_player1_frozen and single_agent_score_as_reward:
            # agent 1 frozen, optimizing for agent 2
            return rews[:, 1]
        
        # Per default we are interested in optimizing both agents
        return rews[:, 0] + rews[:, 1]
    
            

    # trainer
    result = ts.trainer.offpolicy_trainer(policy= policy,
                                          train_collector= train_collector,
                                          test_collector= test_collector,
                                          max_epoch= epochs,
                                          step_per_epoch= step_per_epoch,
                                          step_per_collect= step_per_collect,
                                          episode_per_test= testing_env_num,
                                          batch_size= batch_size,
                                          train_fn= train_fn,
                                          test_fn= test_fn,
                                          # Stop function to stop before specified amount of epochs
                                          #stop_fn= stop_fn
                                          save_best_fn= save_best_fn,
                                          update_per_step= update_per_step,
                                          logger= logger,
                                          test_in_train= False,
                                          reward_metric= reward_metric)
    
    # Save final agent 1
    model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'final_policy_agent1.pth')
    torch.save(policy.policies[agents[0]].state_dict(), model_save_path)

    # Save final agent 2
    model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'final_policy_agent2.pth')
    torch.save(policy.policies[agents[1]].state_dict(), model_save_path)

    return result, policy.policies[agents[0]], policy.policies[agents[1]]

<hr>

### Function for watching learned agent

Identical to the previous notebook.

In [10]:
####################################################
# WATCHING THE LEARNED POLICY IN ACTION
####################################################

def watch(numer_of_games: int = 3,
          agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
          agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
          test_epsilon: float = 0.05, # For the watching we act completely greedy but low random for not getting stuck on invalid move
          render_speed: float = 0.15, # Amount of seconds to update frame/ do a step
          ) -> None:
    
    # Get the connect four V2 environment (must be a list)
    env= ts.env.DummyVectorEnv([get_env])
    
    # Get the agents from the trained agents
    policy, optim, agents = get_agents(agent_player1= agent_player1,
                                       agent_player2= agent_player2)
    
    # Evaluate the policy
    policy.eval()
    
    # Set the testing policy epsilon for our agents
    policy.policies[agents[0]].set_eps(test_epsilon)
    policy.policies[agents[1]].set_eps(test_epsilon)
    
    # Collect the test data
    collector = ts.data.Collector(policy= policy,
                                  env= env,
                                  exploration_noise= True)
    
    # Render games in human mode to see how it plays
    result = collector.collect(n_episode= numer_of_games, render= render_speed)
    
    # Close the environment aftering collecting the results
    # This closes the pygame window after completion
    env.close()
    
    # Get the rewards and length from the test trials
    rewards, length = result["rews"], result["lens"]
    
    # Print the final reward for the first agent
    print(f"Average steps of game:  {length.mean()}")
    print(f"Final mean reward agent 1: {rewards[:, 0].mean()}, std: {rewards[:, 0].std()}")
    print(f"Final mean reward agent 2: {rewards[:, 1].mean()}, std: {rewards[:, 1].std()}")

<hr>

### Doing the experiment

We now do the experiment with using our previously created functions.
We freeze one agent and initialize both agents from previous versions.

The following iterations were made:

1. Freeze agent 1, train agent 2:
    - Model save name: `1-mlp_dqn_frozen_agent1` 
    - Agent 1 start: `./saved_variables/paper_notebooks/5/dqn_vs_dqn/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/5/dqn_vs_dqn/best_policy_agent2.pth`
    - Learning rate: `0.0001`
    - Training epsilon: `0.2`
    - Look ahead steps: `4`
    - Reward for move/invalid: `+1` / `-3`
    - Allow invalid move: `False`
    - Epochs: `1000`
    - Gamma: `0.9`
    - Best epoch: `1` with test reward `1102`
    - Scoring: sum of `both` agent's score
2. Freeze agent 2, train agent 1:
    - Model save name: `2-mlp_dqn_frozen_agent2` 
    - Agent 1 start: `./saved_variables/paper_notebooks/5/dqn_vs_dqn/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/8/1-mlp_dqn_frozen_agent1/final_policy_agent2.pth`
    - Learning rate: `0.0001`
    - Training epsilon: `0.2`
    - Look ahead steps: `4`
    - Reward for move/invalid: `+1` / `-3`
    - Allow invalid move: `False`
    - Epochs: `1000`
    - Gamma: `0.9`
    - Best epoch: `482` with test reward `1102`
    - Scoring: sum of `both` agent's score

After which the agent was so focused on prolonging the game, we decided to lower the learning rate and start optimizing for winning again. We also lowered the amount of epochs in each iterations of swapping the frozen agent.

3. Freeze agent 1, train agent 2:
    - Model save name: `3-mlp_dqn_frozen_agent1` 
    - Agent 1 start: `./saved_variables/paper_notebooks/8/2-mlp_dqn_frozen_agent2/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/8/1-mlp_dqn_frozen_agent1/final_policy_agent2.pth`
    - Learning rate: `0.00005` # halfed learning rate
    - Training epsilon: `0.1` # halfed training epsilon
    - Look ahead steps: `4`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `500`
    - Gamma: `0.8` 
    - Best epoch: `7` with test reward `100`
    - Scoring: reward of `agent 2`
4. Freeze agent 2, train agent 1:
    - Model save name: `4-mlp_dqn_frozen_agent2` 
    - Agent 1 start: `./saved_variables/paper_notebooks/8/2-mlp_dqn_frozen_agent2/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/8/3-mlp_dqn_frozen_agent1/final_policy_agent2.pth`
    - Learning rate: `0.00005`
    - Training epsilon: `0.1`
    - Look ahead steps: `4`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `500`
    - Gamma: `0.8`
    - Best epoch: `XXX` with test reward `YYY`
    - Scoring: reward of `agent 1`
    
To do further training, a loop was created which alternated between freezing agens every 50 epochs. This loop was executed 20 times. The learning rate was also lowered once again.

5. Loop frozen agents:
    - Model save name: `5-50epoch_20loop/looping-iteration-i` 
    - Agent 1 start: `./saved_variables/paper_notebooks/8/4-mlp_dqn_frozen_agent2/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/8/3-mlp_dqn_frozen_agent1/best_policy_agent2.pth`
    - Learning rate: `0.000001`
    - Training epsilon: `0.1`
    - Look ahead steps: `4`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `50` x `20` loops 
    - Gamma: `0.8` 
    - Best epoch: final epoch always taken to next round
    - Scoring: reward of `non frozen agent`
6. Loop frozen agents:
    - Model save name: `6-20epoch_100loop/looping-iteration-i` 
    - Agent 1 start: `./saved_variables/paper_notebooks/8/5-50epoch_20loop/looping-iteration-18/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/8/5-50epoch_20loop/looping-iteration-19/best_policy_agent2.pth`
    - Learning rate: `0.000003`
    - Training epsilon: `0.1`
    - Look ahead steps: `8`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `20` x `100` loops 
    - Gamma: `0.9` 
    - Best epoch: final epoch always taken to next round
    - Scoring: reward of `non frozen agent`
7. Loop frozen agents:
    - Model save name: `7-20epoch_500loop/looping-iteration-i` 
    - Agent 1 start: `./saved_variables/paper_notebooks/8/6-20epoch_100loop/looping-iteration-98/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/8/6-20epoch_100loop/looping-iteration-99/best_policy_agent2.pth`
    - Learning rate: `0.001`
    - Training epsilon: `0.05`
    - Look ahead steps: `8`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `20` x `500` loops 
    - Gamma: `0.9` 
    - Best epoch: final epoch always taken to next round
    - Scoring: reward of `non frozen agent`

For file size reasons, only a portion of the saved agents are kept and stored on GitHub.


In [11]:
####################################################
# EXPERIMENT: TRAINING AGENTS
####################################################

# Configs for the agents
#freeze_agent1 = False
agent1_starting_params = "./saved_variables/paper_notebooks/8/5-50epoch_20loop/looping-iteration-18/best_policy_agent1.pth"

#freeze_agent2 = True
agent2_starting_params = "./saved_variables/paper_notebooks/8/5-50epoch_20loop/looping-iteration-19/best_policy_agent2.pth"

single_agent_score_as_reward = True # To use combined reward or non frozen agent reward as scoring
filename = "7-20epoch_500loop/looping-iteration-i"
epochs = 20
loops = 500

learning_rate = 0.001
training_eps = 0.05
gamma = 0.9
n_step = 8

for loop_idx in range(loops):
    # Filename
    filename = f"7-20epoch_500loop/looping-iteration-{loop_idx}"
    
    # Use provided starting params in first loop, the one from previous iteration in next
    if loop_idx > 0:
        agent1_starting_params = f"./saved_variables/paper_notebooks/8/7-20epoch_500loop/looping-iteration-{loop_idx-1}/final_policy_agent1.pth"
        agent2_starting_params = f"./saved_variables/paper_notebooks/8/7-20epoch_500loop/looping-iteration-{loop_idx-1}/final_policy_agent2.pth"
    
    # Determine what agent to freeze
    freeze_agent1 = True if loop_idx % 2 == 1 else False
    freeze_agent2 = True if loop_idx % 2 == 0 else False
    
    # Get the environment settings
    env = get_env()
    observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
    state_shape = observation_space.shape or observation_space.n
    action_shape = env.action_space.shape or env.action_space.n
    
    # Configure agent 1
    agent1 = cf_custom_dqn_policy(state_shape= state_shape,
                                  action_shape= action_shape,
                                  gamma= gamma,
                                  frozen= freeze_agent1,
                                  learning_rate = learning_rate,
                                  n_step= n_step)
    
    if agent1_starting_params:
        agent1.load_state_dict(torch.load(agent1_starting_params))
        
        # Configure agent 2
        agent2 = cf_custom_dqn_policy(state_shape= state_shape,
                                      action_shape= action_shape,
                                      gamma= gamma,
                                      frozen= freeze_agent2,
                                      learning_rate = learning_rate,
                                      n_step= n_step)
        
        if agent2_starting_params:
            agent2.load_state_dict(torch.load(agent2_starting_params))
            
            
            # Train the agent
            off_policy_traininer_results, final_agent_player1, final_agent_player2 = train_agent(epochs= epochs,
                                                                                                 agent_player1= agent1,
                                                                                                 agent_player1_frozen = freeze_agent1,
                                                                                                 agent_player2= agent2,
                                                                                                 agent_player2_frozen = freeze_agent2,
                                                                                                 filename= filename,
                                                                                                 single_agent_score_as_reward = single_agent_score_as_reward,
                                                                                                 training_eps= training_eps)
            
            

Epoch #1: 1025it [00:03, 335.35it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=138.437, player_2/loss=477.233, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.10it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=128.252, player_2/loss=571.044, rew=19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 500.74it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=72.646, player_2/loss=645.798, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 466.98it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=95.525, player_2/loss=670.831, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 484.76it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=125.483, player_2/loss=635.530, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 493.07it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=102.839, player_2/loss=600.359, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 486.54it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=109.723, player_2/loss=560.617, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 491.06it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=103.085, player_2/loss=593.413, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 483.33it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=88.515, player_2/loss=530.798, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 455.07it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=93.056, player_2/loss=565.915, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 449.55it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=74.893, player_2/loss=556.723, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 418.81it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=107.956, player_2/loss=555.864, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 501.70it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=121.565, player_2/loss=579.678, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 466.27it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=112.533, player_2/loss=640.396, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 476.50it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=143.417, player_2/loss=654.016, rew=13.89]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 473.28it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=94.653, player_2/loss=630.542, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 486.16it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=89.067, player_2/loss=615.854, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 504.85it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=106.576, player_2/loss=655.599, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 493.12it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=45.602, player_2/loss=594.524, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 510.73it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=41.376, player_2/loss=446.048, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 507.34it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=81.753, player_2/loss=354.385, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 511.52it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=128.334, player_2/loss=302.368, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 509.35it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=161.688, player_2/loss=298.135, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 513.41it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=390.368, player_2/loss=267.397, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 511.17it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=571.000, player_2/loss=190.649, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:01, 513.01it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=411.289, player_2/loss=244.039, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 508.37it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=186.677, player_2/loss=308.086, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:01, 513.37it/s, env_step=9216, len=22, n/ep=2, n/st=64, player_1/loss=187.145, player_2/loss=217.573, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 512.26it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=185.996, player_2/loss=94.798, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:01, 514.40it/s, env_step=11264, len=34, n/ep=2, n/st=64, player_1/loss=149.807, player_2/loss=81.578, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:01, 515.46it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=196.701, player_2/loss=125.095, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:01, 513.29it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=231.573, player_2/loss=129.597, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 506.74it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=246.407, player_2/loss=128.673, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:01, 512.64it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=216.639, player_2/loss=119.490, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 512.40it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=203.273, player_2/loss=115.672, rew=-15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:01, 515.32it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=264.553, rew=0.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:01, 512.72it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=255.979, player_2/loss=123.501, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 510.16it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=268.134, player_2/loss=132.671, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 507.96it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=157.588, player_2/loss=181.226, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 508.62it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=198.321, player_2/loss=206.025, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 508.42it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=197.769, player_2/loss=214.994, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 513.26it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=151.703, player_2/loss=223.855, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 508.28it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=187.036, player_2/loss=209.598, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 510.09it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=161.158, player_2/loss=215.347, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 490.15it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=60.880, player_2/loss=213.046, rew=10.71]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 509.37it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=36.573, player_2/loss=216.255, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 505.43it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=118.427, player_2/loss=196.661, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 510.16it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=63.493, player_2/loss=174.141, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 510.83it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=79.989, player_2/loss=188.138, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 507.32it/s, env_step=12288, len=11, n/ep=7, n/st=64, player_1/loss=93.494, player_2/loss=181.377, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 508.02it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=60.726, player_2/loss=188.336, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:01, 513.58it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=35.049, player_2/loss=216.708, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 509.40it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=82.433, player_2/loss=205.140, rew=-5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 511.41it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=155.585, player_2/loss=153.479, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 512.21it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=127.789, player_2/loss=202.984, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 510.18it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=66.734, player_2/loss=233.324, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 507.09it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=78.683, player_2/loss=206.471, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 509.76it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=186.480, player_2/loss=286.831, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 512.19it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=223.123, player_2/loss=275.302, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:01, 513.05it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=197.406, player_2/loss=228.971, rew=-19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:01, 512.90it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=222.068, player_2/loss=179.986, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:01, 515.88it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=263.710, player_2/loss=138.330, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:01, 515.74it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=279.493, player_2/loss=63.225, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 511.31it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=264.290, player_2/loss=46.457, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:01, 514.38it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=237.390, player_2/loss=34.766, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:01, 515.63it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=241.446, player_2/loss=28.032, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:01, 515.31it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=243.529, player_2/loss=30.176, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:01, 517.86it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=315.937, player_2/loss=39.107, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 512.09it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=281.494, player_2/loss=37.517, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:01, 514.44it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=247.508, player_2/loss=11.141, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:01, 516.14it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=312.517, player_2/loss=16.497, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:01, 516.64it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=263.140, player_2/loss=15.087, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:01, 516.18it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=239.011, player_2/loss=13.118, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 512.25it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=229.306, rew=-8.33]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 510.12it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_2/loss=137.312, rew=-15.00]      


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 512.04it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=190.924, player_2/loss=177.669, rew=-5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 507.92it/s, env_step=1024, len=13, n/ep=4, n/st=64, player_1/loss=173.017, player_2/loss=155.394, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 509.01it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=144.688, player_2/loss=166.495, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 507.69it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=131.183, player_2/loss=264.689, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 511.22it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=158.138, player_2/loss=361.214, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 508.25it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=121.535, player_2/loss=436.551, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 506.41it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=89.101, player_2/loss=435.711, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 507.44it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=123.653, player_2/loss=416.759, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 509.13it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=81.137, player_2/loss=402.884, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 505.45it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_2/loss=422.249, rew=13.89]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 508.59it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=172.156, player_2/loss=459.883, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 506.52it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=161.681, player_2/loss=443.445, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 509.59it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=192.238, player_2/loss=427.758, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 507.36it/s, env_step=13312, len=7, n/ep=10, n/st=64, player_1/loss=107.842, player_2/loss=380.393, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 506.67it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=125.738, player_2/loss=376.862, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 506.19it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=96.271, player_2/loss=404.348, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 506.24it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=94.432, player_2/loss=402.292, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 507.33it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=94.631, player_2/loss=434.295, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 506.54it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=100.138, player_2/loss=423.802, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 506.24it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=108.872, player_2/loss=399.736, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 507.39it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=66.994, player_2/loss=349.243, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 509.26it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=95.105, player_2/loss=296.445, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 511.73it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=215.537, player_2/loss=253.682, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 510.65it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=408.069, player_2/loss=153.173, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:01, 512.87it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=508.636, player_2/loss=86.292, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:01, 513.26it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=445.957, player_2/loss=64.674, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 509.20it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=325.574, player_2/loss=64.525, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:01, 514.69it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=333.026, player_2/loss=48.322, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 511.25it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=369.093, player_2/loss=95.227, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 505.44it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=286.215, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 507.54it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=236.381, player_2/loss=194.039, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 509.93it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=166.529, player_2/loss=287.751, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 511.76it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=77.972, player_2/loss=230.574, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:01, 515.27it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=98.091, player_2/loss=178.058, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:01, 513.88it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=185.964, player_2/loss=142.989, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 511.29it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=194.451, player_2/loss=125.560, rew=-15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 512.34it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=195.645, player_2/loss=90.819, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:01, 514.54it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=196.474, player_2/loss=93.714, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:01, 513.49it/s, env_step=19456, len=24, n/ep=2, n/st=64, player_1/loss=230.319, player_2/loss=217.146, rew=37.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:01, 513.20it/s, env_step=1024, len=26, n/ep=3, n/st=64, player_1/loss=78.821, player_2/loss=92.622, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 505.36it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=122.604, player_2/loss=101.148, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 511.55it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=178.233, player_2/loss=133.754, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:01, 514.24it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=171.780, player_2/loss=145.472, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:01, 513.42it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=156.683, player_2/loss=155.897, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:01, 513.63it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=126.537, player_2/loss=130.017, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:01, 515.22it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=122.356, player_2/loss=139.405, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:01, 513.87it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=152.119, player_2/loss=143.369, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:01, 512.59it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=180.630, player_2/loss=185.771, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 507.87it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=126.685, player_2/loss=264.888, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 507.07it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=188.034, player_2/loss=312.946, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 509.16it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=232.429, player_2/loss=318.015, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:01, 513.72it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=130.965, player_2/loss=349.183, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 511.61it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=110.840, player_2/loss=367.982, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 508.16it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=170.704, player_2/loss=375.137, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 509.56it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=164.795, player_2/loss=366.527, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 512.30it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=113.683, player_2/loss=308.863, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 511.18it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=60.833, player_2/loss=265.446, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 511.76it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=66.024, player_2/loss=307.686, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 509.89it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=79.295, player_2/loss=331.354, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 513.28it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=79.665, player_2/loss=266.859, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 506.29it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=110.709, player_2/loss=183.138, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.16it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=171.256, player_2/loss=189.571, rew=-17.86]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 515.32it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=274.169, player_2/loss=197.659, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:01, 514.57it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=280.095, player_2/loss=127.922, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:01, 514.20it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=236.455, player_2/loss=125.888, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 508.51it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=291.085, player_2/loss=149.714, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:01, 512.74it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=330.016, player_2/loss=75.260, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 488.14it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=445.671, player_2/loss=14.716, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 478.34it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=418.766, player_2/loss=17.295, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 511.27it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=338.178, player_2/loss=144.556, rew=-5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:01, 512.68it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=243.096, player_2/loss=216.850, rew=-15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:01, 513.03it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=205.240, player_2/loss=176.435, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:01, 512.94it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=221.364, player_2/loss=146.358, rew=-5.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:01, 513.33it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=247.412, player_2/loss=124.443, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:01, 513.06it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=206.728, player_2/loss=134.071, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:01, 517.10it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=156.915, player_2/loss=104.374, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:01, 514.58it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=178.315, player_2/loss=57.018, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 509.96it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=139.986, player_2/loss=136.454, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 511.94it/s, env_step=2048, len=25, n/ep=3, n/st=64, player_1/loss=130.525, rew=25.00]         


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 511.37it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=123.651, player_2/loss=85.856, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:01, 516.98it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=116.073, player_2/loss=61.990, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:01, 514.96it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=110.359, player_2/loss=65.607, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 510.32it/s, env_step=6144, len=28, n/ep=3, n/st=64, player_1/loss=100.898, player_2/loss=91.313, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 515.42it/s, env_step=7168, len=25, n/ep=3, n/st=64, player_1/loss=107.021, player_2/loss=88.278, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 511.49it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=85.641, player_2/loss=65.135, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 511.94it/s, env_step=9216, len=25, n/ep=2, n/st=64, player_1/loss=108.269, player_2/loss=86.433, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 508.88it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=150.203, player_2/loss=128.497, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 510.01it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=205.776, player_2/loss=201.030, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:01, 515.53it/s, env_step=12288, len=16, n/ep=5, n/st=64, player_1/loss=200.061, player_2/loss=195.170, rew=-15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:01, 512.96it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=152.138, player_2/loss=173.323, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 509.25it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=160.795, player_2/loss=217.594, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 508.88it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=149.189, player_2/loss=298.765, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 508.08it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=77.876, player_2/loss=280.321, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 505.42it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=52.466, player_2/loss=283.552, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 504.16it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=81.839, player_2/loss=285.822, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 507.91it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=97.689, player_2/loss=302.861, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:01, 512.91it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=44.491, player_2/loss=273.993, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 516.43it/s, env_step=2048, len=29, n/ep=2, n/st=64, player_1/loss=102.756, player_2/loss=195.944, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:01, 514.32it/s, env_step=3072, len=25, n/ep=3, n/st=64, player_2/loss=340.365, rew=-25.00]        


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 512.23it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=250.652, player_2/loss=357.457, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:01, 514.66it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=132.812, player_2/loss=146.279, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:01, 516.11it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=84.531, player_2/loss=124.477, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 517.94it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=124.816, player_2/loss=111.539, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:01, 513.98it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=157.436, player_2/loss=168.482, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:01, 515.89it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=191.241, player_2/loss=150.452, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 513.30it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=196.853, player_2/loss=101.813, rew=3.57]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:01, 513.86it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=216.987, player_2/loss=84.274, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:01, 517.87it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=252.459, player_2/loss=90.652, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:01, 514.57it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=222.390, player_2/loss=59.350, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:01, 518.87it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=213.011, player_2/loss=66.332, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:01, 515.72it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=277.596, player_2/loss=58.302, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:01, 516.85it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=285.899, player_2/loss=32.907, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 513.42it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=287.806, player_2/loss=58.675, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:01, 515.13it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=299.004, player_2/loss=86.333, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:01, 514.89it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=263.720, player_2/loss=87.981, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:01, 514.91it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=180.188, player_2/loss=3.735, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 517.43it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=170.630, player_2/loss=6.430, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 498.55it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=142.254, player_2/loss=11.903, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 509.82it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=147.038, player_2/loss=24.610, rew=-15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 512.08it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=172.602, player_2/loss=159.847, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:01, 513.17it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=160.985, player_2/loss=278.174, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 510.44it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=86.320, player_2/loss=376.688, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 509.87it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=43.615, player_2/loss=389.303, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:01, 516.95it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=105.619, player_2/loss=348.585, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 510.64it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=132.425, player_2/loss=347.464, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 511.80it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=128.850, player_2/loss=406.532, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 509.66it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=119.975, player_2/loss=313.505, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:01, 516.13it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=83.074, player_2/loss=396.884, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:01, 517.27it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=30.343, player_2/loss=405.524, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 512.48it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=8.114, player_2/loss=383.861, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 511.64it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=34.458, player_2/loss=331.299, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 509.70it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=45.152, player_2/loss=368.902, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:01, 515.35it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=17.591, player_2/loss=388.217, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 509.79it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=65.451, player_2/loss=439.697, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:01, 512.55it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=33.428, player_2/loss=331.332, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 516.83it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=66.574, player_2/loss=257.776, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.39it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=54.124, player_2/loss=213.513, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 510.37it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=16.866, player_2/loss=214.771, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 516.65it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=67.234, player_2/loss=182.070, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:01, 515.81it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=115.402, player_2/loss=122.044, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:01, 516.91it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=123.350, player_2/loss=55.370, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:01, 515.72it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=168.476, player_2/loss=59.029, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:01, 517.00it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=162.966, player_2/loss=62.314, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:01, 514.60it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=124.288, player_2/loss=108.263, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:01, 515.67it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=127.463, player_2/loss=114.343, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:01, 512.59it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=160.164, player_2/loss=64.830, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:01, 517.29it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=189.189, player_2/loss=75.359, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:01, 514.56it/s, env_step=14336, len=16, n/ep=3, n/st=64, player_1/loss=168.922, player_2/loss=56.980, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:01, 516.15it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=207.853, player_2/loss=54.506, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:01, 515.62it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=239.009, player_2/loss=43.962, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:01, 520.67it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=231.335, player_2/loss=100.595, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:01, 515.52it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=237.102, player_2/loss=79.325, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:01, 513.03it/s, env_step=19456, len=15, n/ep=3, n/st=64, player_1/loss=191.128, player_2/loss=44.263, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 504.09it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=122.152, player_2/loss=9.786, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 513.07it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=92.525, player_2/loss=57.124, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 513.32it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=108.083, player_2/loss=64.492, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 516.14it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=126.063, player_2/loss=43.592, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 515.07it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=93.331, player_2/loss=28.886, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 507.46it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=123.266, player_2/loss=101.210, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 511.90it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=119.554, player_2/loss=104.468, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:01, 513.06it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=121.214, player_2/loss=159.749, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 507.88it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=131.512, player_2/loss=282.835, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 506.02it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=67.124, player_2/loss=378.566, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 505.97it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=94.894, player_2/loss=325.458, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 512.28it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=91.085, player_2/loss=314.121, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 504.77it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=65.111, rew=18.75]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 509.23it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=42.057, player_2/loss=356.024, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 510.46it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=26.177, player_2/loss=366.907, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 508.48it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=28.838, player_2/loss=341.069, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 510.99it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=34.537, player_2/loss=337.148, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 508.81it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=31.095, player_2/loss=384.996, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 510.49it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=19.745, player_2/loss=393.010, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:01, 513.93it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=53.031, player_2/loss=317.686, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 514.43it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=83.377, player_2/loss=255.906, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 515.51it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=118.209, player_2/loss=201.714, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:01, 516.96it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=136.793, player_2/loss=179.891, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:01, 513.44it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=157.842, player_2/loss=189.040, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 511.82it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=126.340, player_2/loss=183.484, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:01, 517.03it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=108.954, player_2/loss=150.594, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:01, 513.22it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=129.381, player_2/loss=123.071, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:01, 512.86it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=140.399, player_2/loss=126.882, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:01, 517.14it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=132.465, player_2/loss=125.068, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:01, 514.74it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=109.756, player_2/loss=105.339, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:01, 515.53it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=139.488, player_2/loss=118.043, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 511.06it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=165.111, player_2/loss=150.696, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:01, 514.82it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=130.957, player_2/loss=145.724, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:01, 516.98it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=251.249, player_2/loss=160.925, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 508.69it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=310.444, player_2/loss=158.896, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:01, 515.39it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=232.082, player_2/loss=146.994, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:01, 516.31it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_2/loss=95.025, rew=16.67]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 511.53it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=221.687, player_2/loss=59.229, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:01, 512.54it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=211.283, player_2/loss=38.335, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 514.67it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=216.501, player_2/loss=86.600, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 509.88it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=251.200, player_2/loss=170.607, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 511.65it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=222.241, player_2/loss=197.936, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 501.44it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=195.172, player_2/loss=173.187, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 510.07it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=206.545, player_2/loss=200.156, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 513.39it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=165.967, player_2/loss=275.859, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 510.32it/s, env_step=8192, len=7, n/ep=10, n/st=64, player_1/loss=110.096, player_2/loss=374.237, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 508.01it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=105.539, player_2/loss=439.178, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 512.69it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=171.766, player_2/loss=381.672, rew=17.86]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 508.98it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=122.675, player_2/loss=384.430, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 511.05it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=48.573, player_2/loss=463.983, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 505.89it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=37.668, player_2/loss=458.684, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 509.15it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=138.604, player_2/loss=346.379, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 511.82it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=178.215, player_2/loss=305.774, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 508.84it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=151.428, player_2/loss=341.844, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 511.06it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=181.398, player_2/loss=309.298, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 511.82it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=175.320, player_2/loss=319.946, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 510.59it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=130.289, player_2/loss=403.135, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:01, 513.21it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=112.126, player_2/loss=371.231, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.09it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=83.138, player_2/loss=326.472, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 506.18it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=110.623, player_2/loss=213.501, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:01, 514.55it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=183.735, player_2/loss=177.791, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:01, 516.63it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=124.595, player_2/loss=181.036, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:01, 518.40it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=156.793, player_2/loss=174.387, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 512.33it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=174.643, player_2/loss=120.365, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 508.34it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=222.790, player_2/loss=148.269, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 510.09it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=234.782, player_2/loss=202.282, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 515.04it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=158.782, player_2/loss=155.748, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:01, 514.16it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=140.307, player_2/loss=204.403, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:01, 513.29it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=139.045, player_2/loss=214.308, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 512.34it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=109.554, player_2/loss=68.606, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:01, 513.59it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=116.079, player_2/loss=62.431, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:01, 514.89it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=159.283, player_2/loss=69.091, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:01, 514.16it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=193.586, rew=8.33]        


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 513.87it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=192.731, player_2/loss=147.527, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 511.53it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=148.475, player_2/loss=80.853, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 511.10it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=164.018, player_2/loss=138.335, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:01, 512.82it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=124.044, player_2/loss=264.193, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 514.73it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=97.931, player_2/loss=216.983, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 509.54it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=70.942, player_2/loss=159.171, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 515.61it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=144.268, player_2/loss=176.337, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 514.71it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=149.705, player_2/loss=212.346, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 514.33it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=124.926, player_2/loss=214.962, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:01, 514.74it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=54.623, player_2/loss=241.498, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 511.86it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=65.766, player_2/loss=250.215, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:01, 515.38it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=103.314, player_2/loss=213.655, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:01, 516.05it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=97.282, player_2/loss=193.173, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 514.55it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=56.267, player_2/loss=211.988, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 506.26it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=43.081, player_2/loss=215.584, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:01, 513.19it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=46.156, player_2/loss=177.558, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 510.83it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=63.669, player_2/loss=154.103, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:01, 514.11it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=84.888, player_2/loss=185.974, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:01, 513.66it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=74.104, player_2/loss=155.564, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 512.36it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=49.988, player_2/loss=155.207, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:01, 516.52it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=48.590, player_2/loss=221.951, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 511.86it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=53.373, player_2/loss=220.682, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 510.43it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=122.966, player_2/loss=181.582, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 511.18it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=120.466, player_2/loss=117.312, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:01, 512.95it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=101.496, player_2/loss=57.300, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:01, 515.30it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=98.545, player_2/loss=79.601, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:01, 515.58it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=96.494, player_2/loss=86.381, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:01, 514.85it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=101.214, player_2/loss=113.814, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 514.80it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=134.940, player_2/loss=102.211, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 510.50it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=135.150, player_2/loss=65.915, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:01, 515.23it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_2/loss=74.183, rew=8.33]           


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 514.41it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=98.264, player_2/loss=66.567, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:01, 514.36it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=120.071, player_2/loss=51.720, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:01, 513.19it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=140.084, player_2/loss=40.440, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:01, 513.96it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=148.521, player_2/loss=24.957, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:01, 513.95it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=174.531, player_2/loss=39.335, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:01, 513.08it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=193.917, player_2/loss=74.061, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:01, 517.03it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=148.483, player_2/loss=121.054, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 515.78it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=116.486, player_2/loss=97.033, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 507.67it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=113.702, player_2/loss=62.285, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:01, 515.86it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=138.061, player_2/loss=57.540, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 511.74it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=90.634, player_2/loss=59.046, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 516.57it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_2/loss=67.389, rew=0.00]           


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 513.55it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=135.460, player_2/loss=106.770, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 514.26it/s, env_step=4096, len=15, n/ep=3, n/st=64, player_1/loss=134.435, player_2/loss=200.138, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 518.09it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=152.773, player_2/loss=237.684, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 513.55it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=164.232, player_2/loss=220.796, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 512.19it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=113.463, player_2/loss=247.374, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 513.43it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=132.662, player_2/loss=340.333, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 511.24it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=165.757, player_2/loss=334.369, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 503.27it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=152.704, player_2/loss=356.229, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 506.42it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=135.280, player_2/loss=307.751, rew=19.44]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 510.86it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=209.479, player_2/loss=303.245, rew=19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 508.77it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=150.145, player_2/loss=355.271, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 511.64it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=127.357, player_2/loss=377.554, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 511.13it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=102.385, player_2/loss=320.577, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 510.83it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=75.742, player_2/loss=291.091, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 508.06it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=40.122, player_2/loss=326.297, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 510.84it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=37.780, player_2/loss=344.113, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 511.41it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=45.306, player_2/loss=375.820, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 511.23it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=22.823, player_2/loss=358.021, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 517.00it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=59.493, player_2/loss=340.666, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 518.10it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=104.967, player_2/loss=253.318, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 515.03it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=97.354, player_2/loss=253.596, rew=-19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 514.60it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=201.433, player_2/loss=223.406, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:01, 512.85it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=361.578, player_2/loss=133.505, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 512.22it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=446.907, player_2/loss=33.005, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:01, 516.33it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=437.437, player_2/loss=28.663, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:01, 516.33it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=330.790, player_2/loss=102.745, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:01, 513.49it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=184.375, player_2/loss=159.957, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:01, 514.94it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=194.610, player_2/loss=128.995, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:01, 514.31it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=237.873, player_2/loss=95.966, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:01, 512.95it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=225.931, player_2/loss=106.638, rew=-15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:01, 517.05it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=240.965, player_2/loss=115.616, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:01, 517.28it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=461.900, player_2/loss=62.737, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 511.78it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=381.909, player_2/loss=25.472, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:01, 512.63it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=304.699, player_2/loss=43.023, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:01, 513.15it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=294.112, player_2/loss=118.628, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 511.06it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=222.986, player_2/loss=136.023, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:01, 514.24it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=127.882, player_2/loss=86.381, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 512.81it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=113.432, player_2/loss=88.225, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 513.77it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=112.828, player_2/loss=80.432, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.57it/s, env_step=4096, len=32, n/ep=1, n/st=64, player_1/loss=121.245, player_2/loss=74.513, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 511.02it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=112.034, player_2/loss=113.676, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 519.81it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=97.776, player_2/loss=106.014, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 511.40it/s, env_step=7168, len=32, n/ep=2, n/st=64, player_1/loss=102.063, player_2/loss=62.401, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:01, 513.57it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=69.212, player_2/loss=49.544, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:01, 515.08it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=38.642, player_2/loss=142.637, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:01, 519.19it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=19.577, player_2/loss=153.804, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:01, 515.11it/s, env_step=11264, len=27, n/ep=3, n/st=64, player_1/loss=293.190, player_2/loss=90.907, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 509.38it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=357.088, rew=0.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 512.33it/s, env_step=13312, len=28, n/ep=2, n/st=64, player_1/loss=107.021, player_2/loss=115.731, rew=37.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 509.11it/s, env_step=14336, len=28, n/ep=3, n/st=64, player_1/loss=87.459, player_2/loss=126.328, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:01, 513.56it/s, env_step=15360, len=25, n/ep=3, n/st=64, player_1/loss=81.039, player_2/loss=100.177, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:01, 516.44it/s, env_step=16384, len=22, n/ep=3, n/st=64, player_1/loss=90.455, player_2/loss=78.138, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:01, 513.33it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=84.640, player_2/loss=80.202, rew=12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:01, 516.27it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=76.894, rew=8.33]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:01, 515.81it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=83.122, player_2/loss=82.240, rew=-12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 508.05it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=142.498, player_2/loss=128.486, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 511.90it/s, env_step=2048, len=26, n/ep=3, n/st=64, player_1/loss=148.695, player_2/loss=150.028, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 506.60it/s, env_step=3072, len=30, n/ep=2, n/st=64, player_1/loss=156.367, player_2/loss=104.136, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 508.41it/s, env_step=4096, len=28, n/ep=2, n/st=64, player_1/loss=123.915, player_2/loss=76.176, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 515.68it/s, env_step=5120, len=29, n/ep=2, n/st=64, player_1/loss=102.405, player_2/loss=107.613, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 511.22it/s, env_step=6144, len=20, n/ep=2, n/st=64, player_1/loss=72.631, player_2/loss=90.896, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:01, 516.31it/s, env_step=7168, len=31, n/ep=2, n/st=64, player_1/loss=88.238, player_2/loss=126.665, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 511.91it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=108.542, player_2/loss=124.005, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 510.28it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=141.926, player_2/loss=108.179, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 507.39it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=128.982, player_2/loss=111.783, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 514.13it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=139.059, player_2/loss=119.614, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 508.73it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=127.779, player_2/loss=137.131, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 510.61it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=120.905, player_2/loss=104.966, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 510.36it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=166.675, player_2/loss=145.873, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:01, 513.79it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=146.394, player_2/loss=144.987, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 511.80it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=126.430, player_2/loss=79.620, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 507.99it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=110.695, player_2/loss=24.428, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 512.43it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=124.874, player_2/loss=20.072, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:01, 513.41it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=149.352, player_2/loss=35.048, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:01, 512.67it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=189.818, player_2/loss=46.585, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.87it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=149.326, player_2/loss=40.330, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 511.41it/s, env_step=3072, len=12, n/ep=4, n/st=64, player_1/loss=117.928, player_2/loss=95.042, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.54it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=104.906, player_2/loss=162.207, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 506.00it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=162.731, player_2/loss=261.907, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 510.45it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=174.156, player_2/loss=276.681, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 510.96it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=109.113, player_2/loss=308.920, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 506.65it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=68.325, player_2/loss=314.411, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 508.03it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=9.517, player_2/loss=326.046, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 509.03it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=69.203, player_2/loss=337.042, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 507.88it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=78.350, player_2/loss=342.058, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 510.04it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_2/loss=292.141, rew=6.25]         


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 509.50it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=45.972, player_2/loss=291.472, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 508.30it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=43.507, player_2/loss=247.752, rew=18.75]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 509.39it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=53.440, player_2/loss=267.521, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 511.18it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=110.767, player_2/loss=305.789, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 510.53it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=84.836, rew=25.00]         


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 511.84it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=18.415, player_2/loss=268.807, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 510.62it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=77.264, player_2/loss=261.510, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:01, 513.61it/s, env_step=1024, len=27, n/ep=2, n/st=64, player_1/loss=32.265, player_2/loss=212.755, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 519.39it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=58.357, player_2/loss=141.799, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 513.53it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=111.311, player_2/loss=86.212, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 514.19it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=140.570, player_2/loss=105.666, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 518.02it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=142.345, player_2/loss=96.374, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 518.43it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=151.632, player_2/loss=93.784, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:01, 512.84it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=175.495, player_2/loss=138.493, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 515.33it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=184.068, player_2/loss=112.990, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:01, 516.74it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=141.878, player_2/loss=69.700, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:01, 516.37it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=138.720, player_2/loss=88.305, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 514.24it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=159.430, player_2/loss=85.938, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 510.95it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=132.566, player_2/loss=96.261, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:01, 517.10it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=139.044, player_2/loss=60.024, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:01, 516.01it/s, env_step=14336, len=24, n/ep=3, n/st=64, player_1/loss=150.813, player_2/loss=65.107, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:01, 513.53it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=187.514, player_2/loss=98.124, rew=-3.57]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:01, 512.90it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=185.522, player_2/loss=178.063, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:01, 513.49it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=180.818, player_2/loss=244.760, rew=-13.89]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:01, 512.91it/s, env_step=18432, len=26, n/ep=3, n/st=64, player_1/loss=148.588, player_2/loss=206.439, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #18


Epoch #19: 1025it [00:01, 517.43it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=126.947, player_2/loss=85.499, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #18


Epoch #1: 1025it [00:02, 510.46it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=164.443, player_2/loss=76.448, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.88it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=137.184, player_2/loss=114.707, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 515.16it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=80.578, player_2/loss=140.780, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 512.04it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=118.281, player_2/loss=173.491, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 509.21it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=164.163, player_2/loss=159.784, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 510.36it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=95.311, player_2/loss=204.480, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 510.06it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=67.433, player_2/loss=208.202, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 506.09it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=71.417, player_2/loss=238.385, rew=19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 510.64it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=117.818, player_2/loss=216.646, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 509.06it/s, env_step=10240, len=7, n/ep=10, n/st=64, player_1/loss=69.005, player_2/loss=227.078, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 511.24it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=56.210, player_2/loss=249.286, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 507.24it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=35.003, player_2/loss=272.886, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 509.07it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=63.914, player_2/loss=300.136, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 505.11it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=95.795, player_2/loss=298.570, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 507.62it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=122.245, player_2/loss=256.750, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 504.74it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=39.970, rew=25.00]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 507.93it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=9.327, player_2/loss=193.912, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 511.87it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=27.188, player_2/loss=213.471, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 508.73it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=87.057, player_2/loss=224.739, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 509.20it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=78.167, player_2/loss=236.831, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 515.38it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=38.163, player_2/loss=211.134, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 508.61it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=76.090, player_2/loss=179.346, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:01, 513.41it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=206.985, player_2/loss=137.362, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:01, 513.93it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=318.943, player_2/loss=89.259, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:01, 513.41it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=338.993, player_2/loss=86.321, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:01, 512.91it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=300.506, player_2/loss=58.378, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 511.69it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=273.912, player_2/loss=33.321, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:01, 515.56it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=273.973, player_2/loss=21.505, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 510.06it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=263.270, player_2/loss=10.019, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 512.36it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=244.130, player_2/loss=20.901, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:01, 515.06it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=244.795, player_2/loss=30.188, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 497.39it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=292.491, player_2/loss=16.480, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 509.85it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=293.120, player_2/loss=9.029, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:01, 514.15it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=254.562, player_2/loss=8.582, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 512.44it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=255.856, player_2/loss=59.219, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 511.96it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=212.536, player_2/loss=57.317, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 509.91it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=185.038, player_2/loss=3.485, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 509.53it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=215.236, player_2/loss=19.296, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:01, 512.61it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=133.621, player_2/loss=28.752, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 513.07it/s, env_step=2048, len=14, n/ep=3, n/st=64, player_1/loss=128.244, player_2/loss=29.544, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 516.52it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=133.551, rew=25.00]         


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 517.59it/s, env_step=4096, len=28, n/ep=2, n/st=64, player_1/loss=96.580, player_2/loss=107.965, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:01, 514.80it/s, env_step=5120, len=29, n/ep=2, n/st=64, player_1/loss=69.465, player_2/loss=150.987, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:01, 514.31it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=105.852, player_2/loss=264.855, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 512.40it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=123.684, rew=25.00]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:01, 514.28it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=81.245, player_2/loss=174.401, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:01, 514.00it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_2/loss=155.041, rew=-5.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:01, 516.47it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=190.148, player_2/loss=176.065, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:01, 512.77it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=186.034, player_2/loss=224.724, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 509.14it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=121.098, player_2/loss=339.888, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 511.00it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=119.699, player_2/loss=356.581, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 506.53it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=162.032, player_2/loss=359.608, rew=19.44]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 509.32it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=157.883, player_2/loss=347.169, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 507.74it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=110.151, player_2/loss=335.279, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 511.71it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=74.727, player_2/loss=338.303, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 508.62it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=59.508, player_2/loss=323.865, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 510.14it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=51.241, player_2/loss=337.056, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 508.99it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=169.474, player_2/loss=139.686, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 511.73it/s, env_step=2048, len=10, n/ep=4, n/st=64, player_1/loss=155.909, player_2/loss=136.497, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 512.34it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=243.080, player_2/loss=99.904, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:01, 515.31it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=279.371, player_2/loss=56.432, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 510.25it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=244.858, player_2/loss=71.598, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 510.30it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=287.548, player_2/loss=52.262, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 514.61it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=292.529, rew=-8.33]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:01, 513.52it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=249.225, player_2/loss=95.458, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:01, 512.67it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=217.689, player_2/loss=138.765, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 514.22it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=235.512, player_2/loss=147.448, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 507.03it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=264.825, player_2/loss=97.664, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 511.95it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=240.117, player_2/loss=56.198, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:01, 513.31it/s, env_step=13312, len=11, n/ep=7, n/st=64, player_1/loss=271.305, player_2/loss=49.234, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:01, 513.33it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=303.173, player_2/loss=53.678, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:01, 512.93it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=282.001, player_2/loss=69.968, rew=-5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 507.52it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=263.299, player_2/loss=55.593, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 506.88it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=264.473, player_2/loss=26.616, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 512.20it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=269.945, player_2/loss=39.243, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:01, 515.63it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=272.678, player_2/loss=44.338, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 507.67it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=235.075, player_2/loss=7.232, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.57it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=185.784, player_2/loss=94.801, rew=-16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 511.45it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=110.116, player_2/loss=145.998, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.55it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=77.016, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 508.62it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=73.733, player_2/loss=174.956, rew=-5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 512.59it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=53.974, player_2/loss=256.678, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 511.88it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=46.500, player_2/loss=288.246, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 511.61it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=39.949, player_2/loss=239.015, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 509.97it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=66.957, player_2/loss=290.096, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 505.97it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=139.276, player_2/loss=275.922, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 510.18it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=154.233, player_2/loss=245.688, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 512.38it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=80.321, player_2/loss=198.311, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 511.62it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=29.532, player_2/loss=192.755, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 509.88it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=19.405, player_2/loss=257.810, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 512.24it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=46.005, player_2/loss=248.329, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 504.66it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=49.938, player_2/loss=192.906, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 506.87it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=89.507, player_2/loss=158.932, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 510.55it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=34.170, player_2/loss=196.839, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 510.54it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=38.679, player_2/loss=206.255, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 509.04it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=23.295, player_2/loss=140.717, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 514.39it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=24.215, player_2/loss=132.821, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 509.85it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=22.292, player_2/loss=95.567, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:01, 512.61it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=22.473, player_2/loss=93.495, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:01, 515.48it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=51.861, player_2/loss=117.481, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:01, 517.20it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=76.214, player_2/loss=104.958, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 512.11it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=66.665, player_2/loss=82.692, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:01, 515.60it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=45.621, player_2/loss=69.538, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:01, 515.35it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=63.825, player_2/loss=77.296, rew=-18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:01, 513.94it/s, env_step=10240, len=10, n/ep=7, n/st=64, player_1/loss=152.932, player_2/loss=194.066, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 511.19it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=282.811, player_2/loss=259.137, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 511.05it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=376.218, player_2/loss=170.049, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:01, 513.94it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=328.309, player_2/loss=160.516, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:01, 513.01it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=319.938, player_2/loss=129.114, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:01, 513.30it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=394.397, player_2/loss=107.394, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:01, 513.09it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=412.512, player_2/loss=84.507, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:01, 513.11it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=470.488, player_2/loss=53.418, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:01, 512.84it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=506.885, player_2/loss=50.021, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:01, 514.78it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=387.702, player_2/loss=43.418, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 507.62it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=343.061, player_2/loss=507.437, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 506.99it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=155.476, rew=19.44]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 511.78it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=82.200, player_2/loss=925.906, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 507.43it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=154.378, player_2/loss=757.569, rew=6.25]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 510.15it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=143.623, player_2/loss=565.579, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 506.60it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=85.256, player_2/loss=602.529, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 504.94it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=89.999, player_2/loss=689.255, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 506.35it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=63.612, player_2/loss=749.205, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 508.67it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=110.392, player_2/loss=698.455, rew=19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 509.67it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=99.775, player_2/loss=614.445, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 505.86it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=85.898, player_2/loss=646.966, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 507.44it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=87.475, player_2/loss=716.901, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 508.32it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=62.122, player_2/loss=786.823, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 507.19it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=53.098, player_2/loss=752.275, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 508.88it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=51.331, player_2/loss=865.444, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 509.49it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=100.921, player_2/loss=850.394, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 505.06it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=114.766, player_2/loss=597.428, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 482.84it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=104.400, player_2/loss=667.083, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 503.81it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=54.598, player_2/loss=731.940, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 507.27it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=94.657, player_2/loss=496.408, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 508.11it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=57.921, player_2/loss=458.079, rew=16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 509.01it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=150.835, player_2/loss=383.367, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.94it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=332.409, player_2/loss=256.016, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 507.33it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=379.583, player_2/loss=124.675, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 509.02it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_2/loss=73.314, rew=16.67]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 509.82it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=572.033, player_2/loss=58.716, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 508.02it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=542.218, player_2/loss=89.471, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 503.75it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=466.190, player_2/loss=76.197, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:01, 513.02it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=539.321, player_2/loss=55.556, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 512.23it/s, env_step=11264, len=9, n/ep=6, n/st=64, player_1/loss=557.396, player_2/loss=59.440, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 505.06it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=491.891, player_2/loss=70.005, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 510.24it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=455.318, player_2/loss=91.615, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 507.78it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=464.159, player_2/loss=59.569, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 507.93it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=497.375, player_2/loss=77.896, rew=16.67]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 508.88it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=576.743, player_2/loss=51.776, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 512.02it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=563.894, player_2/loss=25.747, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 510.56it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=535.665, player_2/loss=74.869, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 507.90it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=532.671, player_2/loss=95.175, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 509.04it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=242.510, player_2/loss=170.596, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.49it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=217.820, player_2/loss=86.880, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 464.02it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=230.175, player_2/loss=202.285, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 506.78it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=200.792, player_2/loss=292.250, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 509.32it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=182.838, player_2/loss=459.886, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 507.05it/s, env_step=6144, len=9, n/ep=8, n/st=64, player_1/loss=76.518, player_2/loss=568.645, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 510.61it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=25.666, player_2/loss=562.674, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 509.94it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=36.839, player_2/loss=667.578, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 511.98it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=35.488, player_2/loss=569.876, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 511.00it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=27.609, player_2/loss=431.323, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 507.65it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=45.998, player_2/loss=416.489, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 507.40it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=54.471, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 510.29it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=48.111, player_2/loss=485.471, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 511.69it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=21.241, player_2/loss=523.046, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 509.06it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=12.030, player_2/loss=642.138, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 509.67it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=18.196, player_2/loss=678.322, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 509.22it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=13.786, player_2/loss=717.823, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 508.16it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=16.860, player_2/loss=763.960, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 510.64it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=19.930, player_2/loss=700.347, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:01, 513.03it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=11.614, player_2/loss=525.909, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 513.35it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=35.486, rew=-25.00]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 510.67it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=101.459, player_2/loss=262.360, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.42it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=154.207, player_2/loss=227.934, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 513.48it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=151.468, player_2/loss=217.127, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 515.24it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=181.608, player_2/loss=185.943, rew=-13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 511.51it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=197.717, player_2/loss=204.333, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 508.44it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=192.917, player_2/loss=171.945, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 499.14it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=246.022, player_2/loss=88.431, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 511.30it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=298.705, player_2/loss=62.322, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:01, 513.44it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=354.888, player_2/loss=32.771, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:01, 515.28it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=315.380, player_2/loss=31.505, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 511.15it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=281.629, player_2/loss=28.051, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 510.44it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=309.526, player_2/loss=23.880, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 510.77it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=304.207, player_2/loss=31.849, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 512.05it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=211.121, player_2/loss=48.062, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:01, 513.60it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=248.946, rew=25.00]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 510.68it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=252.356, player_2/loss=38.703, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 507.58it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=255.306, player_2/loss=36.981, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 504.01it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=174.312, player_2/loss=228.095, rew=6.25]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.64it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=131.849, player_2/loss=267.182, rew=19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 508.34it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=87.731, rew=6.25]            


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 511.71it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=46.503, player_2/loss=312.021, rew=19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 505.67it/s, env_step=5120, len=7, n/ep=10, n/st=64, player_1/loss=149.972, player_2/loss=308.783, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 511.28it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=111.262, player_2/loss=307.618, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 506.57it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=58.424, player_2/loss=350.716, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 509.17it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=82.643, player_2/loss=340.598, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 511.63it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=68.014, player_2/loss=278.307, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 505.91it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=58.844, player_2/loss=312.726, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 504.79it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=62.411, player_2/loss=331.020, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 507.95it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=72.671, player_2/loss=321.643, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 508.94it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=143.564, player_2/loss=289.609, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 507.75it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=117.610, player_2/loss=305.591, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 505.76it/s, env_step=15360, len=7, n/ep=10, n/st=64, player_1/loss=89.376, player_2/loss=289.544, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 505.98it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=76.552, player_2/loss=332.827, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 504.56it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=33.643, player_2/loss=345.316, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 507.19it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=34.493, player_2/loss=355.932, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 506.60it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=68.576, player_2/loss=338.669, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 505.55it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=113.605, player_2/loss=258.933, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 504.83it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=148.638, player_2/loss=154.262, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 507.58it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=253.076, player_2/loss=35.491, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 508.08it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=253.888, player_2/loss=47.446, rew=-5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 512.06it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=170.059, player_2/loss=69.858, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 509.57it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=199.152, player_2/loss=55.329, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 504.85it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=205.574, player_2/loss=34.487, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 509.65it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=211.034, player_2/loss=12.207, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 508.26it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=202.313, player_2/loss=67.840, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 508.65it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=195.558, player_2/loss=87.946, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 510.43it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=181.728, player_2/loss=32.817, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 507.59it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=199.152, player_2/loss=56.249, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 511.93it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=227.734, player_2/loss=74.926, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 510.36it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=240.134, player_2/loss=21.825, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 509.69it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=239.301, player_2/loss=26.137, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 510.37it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=220.967, player_2/loss=42.012, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 507.88it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=202.481, player_2/loss=39.027, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 508.89it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=211.698, player_2/loss=33.827, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 509.94it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=203.335, player_2/loss=22.859, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 510.00it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=141.383, player_2/loss=231.705, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 508.62it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=166.756, player_2/loss=149.404, rew=-10.71]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 506.50it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=210.980, player_2/loss=92.841, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 508.01it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=203.544, player_2/loss=178.335, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 506.36it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=100.355, player_2/loss=289.129, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 506.93it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=90.761, player_2/loss=260.578, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 504.04it/s, env_step=7168, len=10, n/ep=8, n/st=64, player_1/loss=106.181, player_2/loss=277.757, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 503.04it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=42.393, player_2/loss=312.778, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 504.18it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=34.824, player_2/loss=304.285, rew=17.86]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 506.01it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=19.439, player_2/loss=322.013, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 506.42it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=76.200, player_2/loss=308.847, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 504.95it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=47.012, player_2/loss=343.320, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 505.01it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=35.225, player_2/loss=312.695, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 504.97it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=19.079, player_2/loss=318.371, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 509.18it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=11.722, player_2/loss=284.222, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 500.53it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=24.883, player_2/loss=374.269, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 506.74it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=44.069, player_2/loss=399.877, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 503.66it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=62.307, player_2/loss=403.505, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 505.10it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=42.572, player_2/loss=321.696, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 507.37it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=74.934, player_2/loss=259.503, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 507.62it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=137.034, player_2/loss=259.690, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 502.43it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=206.806, player_2/loss=222.956, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 504.93it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=214.998, player_2/loss=269.943, rew=-19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 508.95it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=247.371, player_2/loss=288.177, rew=-19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 509.37it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=210.903, player_2/loss=258.891, rew=-13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 510.98it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=276.532, player_2/loss=251.365, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 508.69it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=197.559, player_2/loss=274.564, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 506.28it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=252.157, player_2/loss=240.063, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 510.39it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=328.060, player_2/loss=221.225, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 505.89it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=310.263, player_2/loss=224.160, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 511.72it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=312.859, player_2/loss=187.923, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 510.15it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=459.062, player_2/loss=94.530, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 509.10it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=615.250, player_2/loss=99.804, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 512.37it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=548.515, player_2/loss=122.163, rew=10.71]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 508.51it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=506.943, player_2/loss=126.576, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 512.34it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=616.687, player_2/loss=133.711, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 510.51it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=630.834, player_2/loss=49.678, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 503.37it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=630.423, player_2/loss=70.172, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 508.19it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=284.085, player_2/loss=81.589, rew=17.86]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.91it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=227.737, player_2/loss=361.496, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 505.08it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=85.672, player_2/loss=462.024, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 504.29it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=39.250, player_2/loss=633.828, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 502.80it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=36.623, player_2/loss=575.181, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 507.11it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=29.423, player_2/loss=638.067, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 499.08it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=11.115, player_2/loss=619.228, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 504.72it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=56.968, player_2/loss=616.388, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 502.07it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=53.754, player_2/loss=667.768, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 505.18it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=11.077, player_2/loss=643.337, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 506.69it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=9.567, player_2/loss=593.172, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 506.90it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=7.034, player_2/loss=519.267, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 507.54it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=9.411, player_2/loss=583.402, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 500.81it/s, env_step=14336, len=7, n/ep=10, n/st=64, player_1/loss=58.803, player_2/loss=611.539, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 507.37it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=93.515, player_2/loss=491.458, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 504.96it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=49.732, player_2/loss=513.313, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 506.30it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=9.581, player_2/loss=451.515, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 507.22it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=4.524, player_2/loss=546.133, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 502.54it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=4.878, player_2/loss=615.022, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 508.17it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=14.617, player_2/loss=500.562, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 506.92it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=26.332, player_2/loss=390.342, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 509.23it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=31.077, player_2/loss=301.071, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.69it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=21.792, player_2/loss=279.662, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 507.89it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=44.228, player_2/loss=210.552, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 510.64it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=44.730, player_2/loss=158.419, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 509.78it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=16.293, rew=-25.00]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 513.61it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=58.079, player_2/loss=141.406, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 511.01it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=95.116, player_2/loss=131.052, rew=-15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #10: 1025it [00:02, 507.32it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=148.944, player_2/loss=137.109, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #11: 1025it [00:02, 511.52it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=141.451, player_2/loss=125.063, rew=-15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #12: 1025it [00:02, 512.26it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=123.770, player_2/loss=91.736, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #13: 1025it [00:01, 514.62it/s, env_step=13312, len=26, n/ep=2, n/st=64, player_1/loss=110.344, player_2/loss=79.608, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #14: 1025it [00:02, 511.58it/s, env_step=14336, len=30, n/ep=2, n/st=64, player_1/loss=150.818, player_2/loss=90.363, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #15: 1025it [00:02, 511.49it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=168.312, player_2/loss=105.794, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #16: 1025it [00:02, 511.60it/s, env_step=16384, len=32, n/ep=2, n/st=64, player_1/loss=140.514, player_2/loss=62.578, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #17: 1025it [00:02, 511.70it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=149.294, player_2/loss=66.751, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #18: 1025it [00:01, 513.26it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=135.589, player_2/loss=52.388, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #19: 1025it [00:02, 505.10it/s, env_step=19456, len=27, n/ep=2, n/st=64, player_1/loss=110.759, player_2/loss=59.579, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #1: 1025it [00:02, 509.43it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=73.455, player_2/loss=76.502, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 507.00it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=91.728, player_2/loss=99.366, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 509.66it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=169.964, player_2/loss=132.622, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 509.59it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=152.303, player_2/loss=110.895, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 503.56it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=191.744, player_2/loss=127.987, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 511.86it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=219.319, player_2/loss=152.505, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 509.16it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=164.356, player_2/loss=165.901, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 511.00it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=120.199, player_2/loss=118.586, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 509.32it/s, env_step=9216, len=21, n/ep=4, n/st=64, player_1/loss=41.531, player_2/loss=88.172, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 511.59it/s, env_step=10240, len=28, n/ep=2, n/st=64, player_1/loss=77.778, player_2/loss=130.125, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:01, 515.17it/s, env_step=11264, len=23, n/ep=2, n/st=64, player_1/loss=69.101, player_2/loss=98.801, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 508.94it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=96.621, player_2/loss=109.123, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 509.77it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=104.582, player_2/loss=182.549, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 504.71it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=105.463, player_2/loss=261.069, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 508.81it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=140.433, player_2/loss=245.125, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 506.66it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=106.326, rew=25.00]       


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 510.29it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=139.400, player_2/loss=224.477, rew=6.25]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 510.58it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=103.092, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 505.06it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=111.736, player_2/loss=237.423, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 508.26it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=22.454, player_2/loss=158.270, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.50it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=22.780, player_2/loss=138.493, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 501.59it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=34.923, player_2/loss=102.066, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 498.86it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=94.183, player_2/loss=118.361, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 502.88it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=219.696, player_2/loss=184.950, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 508.82it/s, env_step=6144, len=7, n/ep=10, n/st=64, player_1/loss=190.826, player_2/loss=246.975, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 513.07it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=155.726, player_2/loss=143.567, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 508.11it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=97.331, player_2/loss=100.675, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 508.65it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=55.339, player_2/loss=71.720, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 512.56it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=187.147, player_2/loss=155.576, rew=-13.89]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 506.26it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=169.353, player_2/loss=201.521, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 508.79it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=389.295, player_2/loss=156.826, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 508.66it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=496.853, player_2/loss=99.359, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 504.18it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=415.916, player_2/loss=132.072, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 510.05it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=312.849, player_2/loss=149.827, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 507.79it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=190.693, player_2/loss=103.285, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 505.46it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=292.706, player_2/loss=50.280, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 510.44it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=414.070, player_2/loss=33.372, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 505.16it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=311.095, player_2/loss=139.095, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 505.50it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=136.101, player_2/loss=138.188, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 507.06it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=82.396, player_2/loss=165.967, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 510.37it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=82.627, player_2/loss=143.831, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 504.08it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=151.275, player_2/loss=158.301, rew=-18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 509.36it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=199.256, player_2/loss=183.173, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 507.15it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=204.994, player_2/loss=205.097, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 512.19it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=108.730, player_2/loss=192.536, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:01, 513.95it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=57.091, player_2/loss=169.958, rew=-15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.96it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=86.351, player_2/loss=197.051, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 510.66it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=77.692, player_2/loss=213.429, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 509.83it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=67.653, player_2/loss=225.683, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 509.93it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=60.039, player_2/loss=266.466, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 510.80it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=64.933, player_2/loss=229.208, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 505.40it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=98.830, player_2/loss=228.402, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 511.79it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=124.393, player_2/loss=207.720, rew=15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 507.48it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=126.752, player_2/loss=195.240, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 512.82it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=67.412, player_2/loss=198.288, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 508.79it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=48.518, player_2/loss=143.417, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 507.78it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=105.439, player_2/loss=121.717, rew=5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 502.26it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=205.417, player_2/loss=25.597, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 511.70it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=243.687, player_2/loss=24.957, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 509.57it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=234.074, player_2/loss=29.939, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 510.56it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=176.070, player_2/loss=25.659, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 510.24it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=130.035, player_2/loss=11.440, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 505.63it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=99.991, player_2/loss=22.400, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 504.69it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=142.736, player_2/loss=27.490, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 516.91it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=192.155, player_2/loss=20.457, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 512.39it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=235.752, player_2/loss=8.769, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 511.49it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=221.495, player_2/loss=16.104, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 514.44it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=166.285, player_2/loss=31.878, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 510.17it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=198.500, player_2/loss=58.630, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 508.16it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=235.830, player_2/loss=52.596, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 507.56it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=193.775, player_2/loss=41.438, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 510.67it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=159.393, player_2/loss=35.249, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 508.84it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=175.217, player_2/loss=30.831, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 508.93it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=202.558, player_2/loss=73.016, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 510.78it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=221.199, player_2/loss=89.579, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 511.72it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=249.455, player_2/loss=57.893, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 506.12it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=207.166, player_2/loss=17.388, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 509.29it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=191.256, player_2/loss=12.839, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 504.87it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=147.239, player_2/loss=36.374, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.58it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=108.214, player_2/loss=132.862, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 509.58it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=71.657, player_2/loss=187.250, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 505.48it/s, env_step=6144, len=7, n/ep=7, n/st=64, player_1/loss=102.614, player_2/loss=288.497, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 506.38it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=174.762, player_2/loss=424.180, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 500.97it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=138.526, rew=6.25]           


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 507.03it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=106.246, player_2/loss=529.752, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 505.99it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=76.139, player_2/loss=502.726, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 504.89it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=72.215, player_2/loss=516.153, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 498.86it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=39.411, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 503.40it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=9.976, player_2/loss=489.887, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 501.26it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=59.089, player_2/loss=491.566, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 505.13it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=82.275, player_2/loss=432.404, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 504.83it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=94.627, rew=19.44]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 505.25it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=126.994, player_2/loss=396.995, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 501.66it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=118.568, player_2/loss=432.808, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 504.74it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=129.692, player_2/loss=466.432, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 507.36it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=57.332, player_2/loss=406.521, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 506.36it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=65.499, player_2/loss=325.164, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 506.94it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=62.406, player_2/loss=258.243, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.08it/s, env_step=4096, len=8, n/ep=9, n/st=64, player_1/loss=50.765, player_2/loss=216.263, rew=-13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 504.30it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=207.100, player_2/loss=217.248, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 511.80it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=291.543, player_2/loss=257.026, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 510.40it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=294.953, player_2/loss=267.325, rew=-13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 510.16it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=314.001, player_2/loss=214.070, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 511.64it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=429.263, player_2/loss=97.620, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:01, 514.12it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=472.644, player_2/loss=34.196, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:01, 513.73it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=500.241, player_2/loss=33.826, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 510.75it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=509.476, player_2/loss=24.514, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 511.24it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=565.496, player_2/loss=12.226, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 512.47it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=623.649, player_2/loss=13.833, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 510.75it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_2/loss=61.448, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 507.78it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=522.725, player_2/loss=86.969, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 510.91it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=436.154, player_2/loss=84.503, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 512.39it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=485.736, player_2/loss=42.993, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 509.18it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=595.143, player_2/loss=15.942, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 507.23it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=363.649, player_2/loss=3.602, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 515.88it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=312.946, player_2/loss=7.924, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 508.41it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=203.593, player_2/loss=14.514, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.51it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=166.419, player_2/loss=12.707, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 515.33it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=172.299, player_2/loss=3.718, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 508.86it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=141.171, player_2/loss=33.937, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 510.89it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=130.404, player_2/loss=61.294, rew=-15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 504.29it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=158.424, player_2/loss=71.518, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 508.62it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=241.354, player_2/loss=327.729, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 509.50it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=282.969, player_2/loss=514.232, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 508.00it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=156.627, player_2/loss=512.150, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 511.15it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=66.826, player_2/loss=468.366, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 506.30it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=89.879, player_2/loss=384.525, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 503.88it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=84.464, player_2/loss=372.480, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 507.42it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=53.771, player_2/loss=378.765, rew=13.89]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 505.62it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=69.564, player_2/loss=450.365, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 506.55it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=51.122, player_2/loss=449.583, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 507.44it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=114.354, player_2/loss=435.388, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 500.99it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=115.373, player_2/loss=447.515, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 508.78it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=45.020, player_2/loss=396.634, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 508.51it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=152.592, player_2/loss=207.209, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:01, 512.90it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=234.054, player_2/loss=71.479, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 508.36it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=262.383, player_2/loss=66.653, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 511.98it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=249.339, player_2/loss=66.983, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 512.41it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=323.270, player_2/loss=59.536, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 511.05it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=318.784, player_2/loss=56.939, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 508.75it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=309.848, player_2/loss=46.085, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 511.46it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=351.175, player_2/loss=38.150, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 510.91it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=363.478, player_2/loss=37.184, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 507.83it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=333.207, player_2/loss=36.613, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 506.81it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=274.476, player_2/loss=75.837, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 508.31it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=263.569, player_2/loss=76.546, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 509.95it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=276.067, player_2/loss=56.574, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 506.82it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=272.198, player_2/loss=96.199, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 508.86it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=295.405, player_2/loss=78.127, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 508.48it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_2/loss=11.285, rew=25.00]        


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 511.12it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=263.311, player_2/loss=76.276, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 510.91it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=273.397, player_2/loss=103.791, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 501.53it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=197.639, player_2/loss=51.792, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 511.14it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=138.905, player_2/loss=113.761, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 511.05it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=131.566, player_2/loss=218.993, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 510.67it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=137.557, player_2/loss=182.652, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 511.92it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=101.529, player_2/loss=107.908, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 503.06it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=127.343, player_2/loss=158.668, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 502.61it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=144.742, player_2/loss=214.605, rew=6.25]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 503.30it/s, env_step=8192, len=7, n/ep=10, n/st=64, player_1/loss=260.179, player_2/loss=261.860, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 502.64it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=228.393, player_2/loss=395.516, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 502.74it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=98.749, player_2/loss=449.208, rew=19.44]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 506.07it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=113.322, player_2/loss=478.380, rew=18.75]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 507.03it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=118.807, player_2/loss=490.340, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 508.47it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=143.212, player_2/loss=479.347, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 502.23it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=224.727, player_2/loss=433.570, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 505.75it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=192.897, player_2/loss=388.348, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 504.79it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=155.553, player_2/loss=417.011, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 499.22it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=111.723, player_2/loss=447.520, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 499.46it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=64.869, player_2/loss=493.502, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 504.88it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=95.371, player_2/loss=473.270, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 505.22it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=53.562, player_2/loss=396.633, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 509.51it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=88.595, player_2/loss=351.888, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 505.72it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=97.511, player_2/loss=295.227, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.65it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=89.643, player_2/loss=237.844, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.25it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=126.783, player_2/loss=154.635, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 492.11it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=174.786, player_2/loss=116.199, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 506.02it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=213.253, player_2/loss=76.351, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 511.52it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=213.596, player_2/loss=29.905, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 510.85it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=237.634, player_2/loss=27.686, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 503.94it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=241.082, player_2/loss=15.299, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 501.07it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=239.108, player_2/loss=14.852, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 512.02it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=273.589, player_2/loss=14.307, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 509.12it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=288.029, player_2/loss=9.009, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:01, 512.99it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=246.371, player_2/loss=34.731, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 505.13it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=246.151, player_2/loss=29.080, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 511.97it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=221.192, player_2/loss=82.417, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 509.09it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=214.558, player_2/loss=85.605, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 508.57it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=196.898, player_2/loss=11.625, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 510.39it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=211.119, player_2/loss=32.786, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 505.26it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=199.278, player_2/loss=21.717, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.07it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=173.273, player_2/loss=74.659, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 509.74it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=119.660, player_2/loss=132.668, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 511.04it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=124.015, player_2/loss=136.919, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 506.16it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=165.971, player_2/loss=193.292, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 506.98it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=183.362, player_2/loss=331.309, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 503.71it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=163.511, player_2/loss=420.317, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 505.07it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=135.315, player_2/loss=464.163, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 505.66it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=93.866, player_2/loss=492.018, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 505.03it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=145.565, rew=19.44]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 508.70it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=174.466, player_2/loss=419.640, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 503.88it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=66.761, player_2/loss=384.534, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 503.81it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=38.507, player_2/loss=416.837, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 506.10it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=70.848, player_2/loss=476.569, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 503.88it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=83.649, player_2/loss=500.138, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 502.60it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=40.056, player_2/loss=520.334, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 503.17it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=64.175, player_2/loss=535.410, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 508.25it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=104.753, player_2/loss=430.347, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 504.94it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=62.721, player_2/loss=341.385, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 504.46it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=27.955, player_2/loss=401.772, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 507.76it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=18.865, player_2/loss=390.949, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 508.42it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=20.623, player_2/loss=336.787, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 507.55it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=48.763, player_2/loss=272.903, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 510.97it/s, env_step=5120, len=8, n/ep=9, n/st=64, player_1/loss=113.879, player_2/loss=205.950, rew=-19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 505.83it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=161.557, player_2/loss=130.023, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 510.15it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=219.802, player_2/loss=86.391, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 509.83it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=260.503, player_2/loss=73.035, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 509.31it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=242.667, player_2/loss=51.996, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 511.74it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=255.040, player_2/loss=15.510, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 506.17it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=273.188, player_2/loss=6.508, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 503.97it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=260.519, player_2/loss=4.622, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 511.46it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=258.781, player_2/loss=20.544, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 507.59it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=238.612, player_2/loss=26.053, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 505.02it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=220.388, player_2/loss=46.433, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 510.32it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=201.531, player_2/loss=67.252, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:01, 513.45it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=195.655, player_2/loss=47.296, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 509.00it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=201.350, player_2/loss=31.186, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 503.58it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=175.110, player_2/loss=18.985, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 498.92it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=185.213, player_2/loss=9.624, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 505.88it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=196.943, player_2/loss=206.963, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 506.99it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=192.060, player_2/loss=398.314, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 497.40it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=172.848, rew=18.75]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 500.76it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=117.276, player_2/loss=463.460, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 505.58it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=111.671, player_2/loss=432.412, rew=2.78]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 505.14it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=119.278, player_2/loss=430.674, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 498.50it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=110.357, player_2/loss=452.013, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 505.09it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=155.472, player_2/loss=428.487, rew=6.25]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 505.02it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=162.845, player_2/loss=387.386, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 502.39it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_2/loss=456.348, rew=13.89]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 500.05it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=72.815, player_2/loss=427.984, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 504.31it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=102.596, player_2/loss=402.831, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 502.24it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=91.281, player_2/loss=391.120, rew=6.25]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 500.69it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=83.282, player_2/loss=383.494, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 502.56it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=92.854, player_2/loss=409.241, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 502.92it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=56.046, player_2/loss=462.884, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 505.59it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=61.306, player_2/loss=499.709, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 503.21it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=52.432, player_2/loss=525.346, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 500.47it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=37.179, player_2/loss=271.441, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 509.23it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=78.976, player_2/loss=276.340, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 507.01it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=125.458, player_2/loss=268.886, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 507.77it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=155.778, player_2/loss=259.598, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 503.20it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=247.145, player_2/loss=104.910, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 511.92it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=288.528, player_2/loss=17.335, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 504.91it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=277.762, player_2/loss=39.972, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 510.21it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=269.801, player_2/loss=52.483, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 502.88it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=249.498, player_2/loss=21.642, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 509.95it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=220.106, player_2/loss=4.356, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 510.95it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=251.163, player_2/loss=13.489, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 508.95it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=273.195, player_2/loss=14.720, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 506.26it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=277.675, player_2/loss=25.321, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 502.65it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=285.875, player_2/loss=28.247, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 504.86it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_2/loss=16.553, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 507.78it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=267.992, player_2/loss=24.551, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 508.61it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=290.383, player_2/loss=38.607, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 504.63it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=273.063, player_2/loss=80.371, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 508.49it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=272.298, rew=25.00]       


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 508.29it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=221.516, player_2/loss=62.228, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.87it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=169.095, player_2/loss=68.675, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 485.49it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=96.003, player_2/loss=143.569, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 512.13it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=75.147, player_2/loss=239.062, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 508.14it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=74.709, player_2/loss=213.192, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 506.96it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=73.948, player_2/loss=168.071, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 506.65it/s, env_step=7168, len=16, n/ep=5, n/st=64, player_1/loss=74.700, player_2/loss=157.178, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:01, 513.14it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=60.813, player_2/loss=210.659, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 508.33it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=80.977, player_2/loss=226.504, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 506.86it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=72.245, player_2/loss=216.038, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 507.34it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=43.871, player_2/loss=261.270, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 506.07it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=23.741, player_2/loss=218.437, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 509.26it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=17.437, player_2/loss=218.785, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 505.94it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=35.556, player_2/loss=256.147, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 507.10it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=111.334, player_2/loss=284.106, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 505.83it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=152.249, player_2/loss=343.294, rew=3.57]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 509.30it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=124.317, player_2/loss=380.808, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 500.80it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=139.931, player_2/loss=384.863, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 511.38it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=120.459, player_2/loss=475.282, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 504.87it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=150.591, player_2/loss=294.328, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 508.72it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=313.085, player_2/loss=183.470, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 506.66it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=478.755, player_2/loss=80.994, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 508.97it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=427.690, player_2/loss=86.817, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 499.49it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=441.404, player_2/loss=56.480, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 502.19it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=461.845, player_2/loss=65.340, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 506.24it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=449.982, player_2/loss=86.261, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 508.17it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=404.735, player_2/loss=74.788, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 505.13it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=352.306, player_2/loss=48.219, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 506.70it/s, env_step=10240, len=13, n/ep=6, n/st=64, player_1/loss=306.152, player_2/loss=38.067, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 505.30it/s, env_step=11264, len=13, n/ep=6, n/st=64, player_1/loss=306.131, player_2/loss=37.695, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:01, 513.21it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=468.690, player_2/loss=29.545, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 505.32it/s, env_step=13312, len=9, n/ep=6, n/st=64, player_1/loss=438.957, player_2/loss=49.913, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 502.84it/s, env_step=14336, len=20, n/ep=4, n/st=64, player_1/loss=342.777, player_2/loss=59.478, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 508.92it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=366.040, player_2/loss=43.792, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 510.64it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=363.207, player_2/loss=39.895, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 508.93it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=380.059, player_2/loss=33.595, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 507.43it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=412.231, player_2/loss=36.463, rew=17.86]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 511.83it/s, env_step=19456, len=16, n/ep=3, n/st=64, player_1/loss=434.500, player_2/loss=37.821, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 505.83it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=314.230, player_2/loss=6.090, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 508.21it/s, env_step=2048, len=29, n/ep=2, n/st=64, player_1/loss=224.084, player_2/loss=82.550, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.27it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=147.704, player_2/loss=165.953, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 508.49it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=165.413, player_2/loss=202.410, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 507.95it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=135.830, player_2/loss=204.968, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 507.08it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=114.159, player_2/loss=217.242, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 502.35it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=111.566, player_2/loss=250.328, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 505.88it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=117.655, player_2/loss=346.190, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 506.12it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=90.369, player_2/loss=356.490, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 507.15it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=59.276, player_2/loss=374.763, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.79it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=77.270, player_2/loss=493.521, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 507.98it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_2/loss=462.570, rew=25.00]       


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 506.44it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=25.949, player_2/loss=364.531, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 507.53it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=12.351, player_2/loss=378.377, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 507.12it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=25.317, player_2/loss=284.720, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 510.50it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=28.749, player_2/loss=295.537, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 509.28it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=33.414, player_2/loss=290.810, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 504.81it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=35.161, player_2/loss=310.733, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 505.66it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=52.085, player_2/loss=307.783, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 505.73it/s, env_step=1024, len=24, n/ep=3, n/st=64, player_1/loss=95.886, player_2/loss=159.197, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.87it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=60.287, player_2/loss=146.462, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 508.65it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=73.560, player_2/loss=160.005, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 509.93it/s, env_step=4096, len=26, n/ep=3, n/st=64, player_1/loss=97.258, player_2/loss=164.473, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 511.78it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=104.258, player_2/loss=163.929, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 511.28it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=149.700, player_2/loss=118.641, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 504.38it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=234.504, player_2/loss=115.469, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 507.38it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=193.242, player_2/loss=105.465, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 507.45it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=181.426, player_2/loss=93.368, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 509.39it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=277.385, player_2/loss=99.968, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 504.84it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_2/loss=132.278, rew=15.00]       


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:01, 513.48it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=255.903, player_2/loss=116.100, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 507.62it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=291.880, player_2/loss=101.208, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 504.92it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=366.120, player_2/loss=129.651, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 502.58it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=382.998, player_2/loss=76.231, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 510.60it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=382.704, rew=25.00]        


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 505.92it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=386.312, player_2/loss=66.834, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 503.37it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=331.672, player_2/loss=70.583, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 508.75it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=387.581, player_2/loss=44.281, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 504.05it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=250.866, player_2/loss=705.687, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 503.39it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=122.070, player_2/loss=725.018, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 504.56it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=167.608, player_2/loss=782.914, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 506.83it/s, env_step=4096, len=8, n/ep=9, n/st=64, player_1/loss=172.616, player_2/loss=730.432, rew=-2.78]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 507.27it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=105.491, player_2/loss=622.047, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 502.53it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=99.336, player_2/loss=603.808, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 504.54it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=157.432, player_2/loss=677.660, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 503.66it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=68.471, player_2/loss=714.169, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 505.22it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=83.972, player_2/loss=826.369, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 500.92it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=61.350, player_2/loss=705.831, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 501.05it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=37.630, player_2/loss=605.615, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 507.45it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=80.672, player_2/loss=671.752, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 505.53it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=82.824, player_2/loss=560.455, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 501.28it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=55.963, player_2/loss=710.305, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 501.75it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=85.338, player_2/loss=840.604, rew=6.25]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 504.55it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=103.935, player_2/loss=829.360, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 505.55it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=67.802, player_2/loss=803.222, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 502.93it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=66.917, player_2/loss=783.315, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 508.21it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=65.479, player_2/loss=767.697, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 503.75it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=107.378, player_2/loss=283.178, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 505.48it/s, env_step=2048, len=25, n/ep=3, n/st=64, player_1/loss=133.841, player_2/loss=203.075, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 505.47it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=155.045, player_2/loss=118.212, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 508.04it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=126.037, player_2/loss=141.930, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 506.67it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=104.468, player_2/loss=123.660, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 500.09it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=95.562, player_2/loss=122.151, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 508.64it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=103.478, player_2/loss=144.019, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 509.71it/s, env_step=8192, len=26, n/ep=3, n/st=64, player_1/loss=117.168, player_2/loss=172.790, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 504.36it/s, env_step=9216, len=25, n/ep=3, n/st=64, player_1/loss=130.851, player_2/loss=118.142, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 506.70it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=119.110, player_2/loss=124.519, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 508.93it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=115.207, player_2/loss=136.672, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 508.03it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=131.984, player_2/loss=156.303, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 500.49it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=240.109, player_2/loss=133.531, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 509.10it/s, env_step=14336, len=23, n/ep=3, n/st=64, player_1/loss=210.115, player_2/loss=121.443, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 507.91it/s, env_step=15360, len=23, n/ep=2, n/st=64, player_1/loss=82.273, player_2/loss=120.721, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 507.46it/s, env_step=16384, len=24, n/ep=2, n/st=64, player_1/loss=103.076, player_2/loss=101.122, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 508.19it/s, env_step=17408, len=25, n/ep=2, n/st=64, player_1/loss=113.643, player_2/loss=104.996, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 504.25it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=189.692, player_2/loss=131.467, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 506.41it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=181.563, player_2/loss=135.054, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 502.84it/s, env_step=1024, len=27, n/ep=2, n/st=64, player_1/loss=142.469, player_2/loss=54.331, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 506.87it/s, env_step=2048, len=32, n/ep=2, n/st=64, player_1/loss=126.683, player_2/loss=84.613, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 503.51it/s, env_step=3072, len=29, n/ep=3, n/st=64, player_1/loss=85.294, player_2/loss=81.337, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.82it/s, env_step=4096, len=32, n/ep=2, n/st=64, player_1/loss=72.780, player_2/loss=68.852, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 510.03it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=71.604, player_2/loss=76.660, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 506.36it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=40.629, player_2/loss=43.340, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 504.53it/s, env_step=7168, len=25, n/ep=3, n/st=64, player_1/loss=38.554, player_2/loss=38.449, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 504.31it/s, env_step=8192, len=30, n/ep=3, n/st=64, player_1/loss=37.880, player_2/loss=35.651, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 509.72it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=76.571, player_2/loss=45.582, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 512.21it/s, env_step=10240, len=26, n/ep=3, n/st=64, player_1/loss=95.476, player_2/loss=54.345, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 507.23it/s, env_step=11264, len=32, n/ep=2, n/st=64, player_2/loss=46.598, rew=-25.00]       


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 502.73it/s, env_step=12288, len=32, n/ep=2, n/st=64, player_1/loss=29.906, player_2/loss=50.703, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 502.26it/s, env_step=13312, len=31, n/ep=2, n/st=64, player_1/loss=35.298, player_2/loss=65.546, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 509.96it/s, env_step=14336, len=27, n/ep=2, n/st=64, player_1/loss=80.204, player_2/loss=70.100, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 505.05it/s, env_step=15360, len=37, n/ep=2, n/st=64, player_1/loss=130.005, player_2/loss=71.103, rew=37.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 510.39it/s, env_step=16384, len=19, n/ep=2, n/st=64, player_1/loss=119.811, player_2/loss=66.297, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 503.51it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=78.029, player_2/loss=61.176, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 505.08it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=117.862, player_2/loss=93.808, rew=-12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 507.84it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=134.298, player_2/loss=117.479, rew=5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 506.71it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=152.826, player_2/loss=95.150, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 510.31it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=136.273, player_2/loss=56.128, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 503.35it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=119.798, player_2/loss=19.928, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 502.71it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=134.839, player_2/loss=27.159, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 505.52it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=140.213, player_2/loss=31.672, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 508.78it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=129.335, player_2/loss=33.292, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 503.85it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=129.660, player_2/loss=35.290, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 507.68it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=119.307, player_2/loss=34.171, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 508.65it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=116.950, player_2/loss=84.101, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 504.20it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=76.512, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 510.20it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=79.158, player_2/loss=55.396, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 510.14it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=90.853, player_2/loss=57.682, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 506.37it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=78.176, player_2/loss=49.702, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 504.01it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=62.369, player_2/loss=63.989, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 500.95it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=48.722, player_2/loss=31.029, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 504.04it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=80.679, player_2/loss=13.953, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 506.60it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=95.180, player_2/loss=14.677, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 505.85it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_2/loss=52.733, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 508.63it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=103.466, player_2/loss=46.471, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 503.88it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=90.104, player_2/loss=67.894, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 505.77it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=171.458, player_2/loss=121.252, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 504.67it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=156.741, player_2/loss=187.366, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 507.05it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=159.109, rew=15.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 501.57it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=140.444, player_2/loss=355.753, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 501.23it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=93.729, player_2/loss=418.037, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 504.51it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=54.351, player_2/loss=441.531, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 503.48it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=83.001, player_2/loss=416.509, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 499.82it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=144.116, player_2/loss=423.608, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 502.75it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=60.809, player_2/loss=413.670, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 501.43it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=103.670, player_2/loss=388.145, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 501.26it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=116.609, player_2/loss=390.000, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 501.72it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=133.833, player_2/loss=352.229, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 503.86it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=150.706, player_2/loss=377.063, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 505.59it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=89.559, player_2/loss=348.978, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 497.00it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=68.187, player_2/loss=406.304, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 505.33it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=63.632, player_2/loss=427.856, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 505.05it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=56.643, player_2/loss=430.627, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 497.93it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=20.162, player_2/loss=405.053, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 501.38it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=32.138, player_2/loss=277.977, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.10it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=100.615, player_2/loss=177.879, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 502.02it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=170.242, player_2/loss=94.379, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 501.72it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=203.371, player_2/loss=57.603, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 505.36it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=211.857, player_2/loss=46.018, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 504.93it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=202.462, player_2/loss=70.782, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 497.67it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=214.690, player_2/loss=64.880, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 506.24it/s, env_step=8192, len=12, n/ep=4, n/st=64, player_1/loss=209.712, player_2/loss=38.651, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 500.71it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_2/loss=39.993, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 502.14it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=245.636, player_2/loss=52.158, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 502.03it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=244.039, player_2/loss=93.308, rew=-5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 503.13it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=222.891, player_2/loss=92.549, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 503.31it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=245.848, player_2/loss=44.752, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 501.94it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=267.421, player_2/loss=22.642, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 502.58it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=242.719, player_2/loss=8.154, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 501.04it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=277.709, player_2/loss=25.808, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 502.16it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=305.067, player_2/loss=23.875, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 501.75it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=234.806, player_2/loss=6.778, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 503.68it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=239.889, player_2/loss=6.247, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 503.48it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=221.310, player_2/loss=106.271, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 502.91it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=183.818, player_2/loss=255.272, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 502.07it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=129.392, player_2/loss=409.048, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 505.55it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=107.210, player_2/loss=523.382, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 494.70it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=105.617, player_2/loss=570.841, rew=-15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 498.23it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=87.563, player_2/loss=476.959, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 502.37it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=55.498, player_2/loss=541.346, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 502.60it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=42.422, player_2/loss=621.158, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 503.59it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=33.219, player_2/loss=480.823, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 502.30it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=93.255, rew=15.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 502.79it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=104.081, player_2/loss=388.250, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 497.73it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=50.364, player_2/loss=456.664, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 501.51it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=46.519, player_2/loss=501.264, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 505.14it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=81.599, player_2/loss=596.585, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 504.78it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=93.706, player_2/loss=648.646, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.91it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=64.097, player_2/loss=504.759, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 502.05it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=70.183, player_2/loss=443.949, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 502.41it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=127.595, player_2/loss=439.669, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 503.42it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=136.515, player_2/loss=513.807, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 503.99it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=138.866, player_2/loss=421.160, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 505.73it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=118.210, player_2/loss=302.152, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 505.07it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=62.652, player_2/loss=246.399, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.93it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=72.099, player_2/loss=228.802, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 504.38it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=124.729, player_2/loss=226.833, rew=-5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 505.29it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=88.909, player_2/loss=220.722, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 503.79it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=65.841, player_2/loss=228.914, rew=-16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 500.82it/s, env_step=8192, len=25, n/ep=3, n/st=64, player_1/loss=96.062, player_2/loss=203.567, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 505.85it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=157.197, player_2/loss=135.537, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 504.24it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=198.319, player_2/loss=91.793, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 504.47it/s, env_step=11264, len=31, n/ep=2, n/st=64, player_1/loss=208.693, player_2/loss=90.668, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 502.46it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=209.548, player_2/loss=139.186, rew=-5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 504.99it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=166.683, player_2/loss=143.467, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 500.22it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=208.964, player_2/loss=73.121, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 507.43it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=215.622, player_2/loss=83.005, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 506.31it/s, env_step=16384, len=15, n/ep=3, n/st=64, player_1/loss=122.032, player_2/loss=88.942, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 504.84it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=100.944, player_2/loss=145.709, rew=-18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 497.19it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_2/loss=192.028, rew=12.50]       


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 504.68it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=113.113, player_2/loss=202.051, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 504.72it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=103.315, player_2/loss=184.253, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.79it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=133.067, player_2/loss=147.305, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 504.86it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=94.904, player_2/loss=146.131, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.57it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=63.485, player_2/loss=193.425, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 505.27it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=136.685, player_2/loss=212.222, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.08it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=109.606, player_2/loss=178.023, rew=16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 505.22it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=64.023, player_2/loss=174.943, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 501.15it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=43.515, player_2/loss=149.986, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 503.09it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=69.361, player_2/loss=123.689, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 503.37it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=134.567, player_2/loss=171.688, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 502.63it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=72.151, player_2/loss=172.338, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 503.64it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=66.758, player_2/loss=196.215, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 504.54it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=84.799, player_2/loss=205.256, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 501.81it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=112.847, player_2/loss=215.215, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 501.82it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=65.260, player_2/loss=201.785, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 499.57it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=23.453, player_2/loss=160.396, rew=-5.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 501.48it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=10.944, player_2/loss=164.440, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 506.02it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=37.212, player_2/loss=145.604, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 503.31it/s, env_step=19456, len=16, n/ep=3, n/st=64, player_1/loss=74.485, player_2/loss=155.500, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 500.02it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=182.660, player_2/loss=163.545, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 510.00it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=98.172, rew=-12.50]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 505.97it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=120.101, player_2/loss=163.965, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.72it/s, env_step=4096, len=39, n/ep=1, n/st=64, player_1/loss=205.790, player_2/loss=154.485, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 506.51it/s, env_step=5120, len=26, n/ep=3, n/st=64, player_1/loss=198.349, player_2/loss=104.836, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 503.05it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=136.595, player_2/loss=77.783, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 499.61it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=143.120, player_2/loss=58.521, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 505.30it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=139.601, player_2/loss=111.266, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 510.22it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=99.071, player_2/loss=103.533, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 499.36it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=154.958, player_2/loss=71.521, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 505.20it/s, env_step=11264, len=33, n/ep=2, n/st=64, player_1/loss=196.777, player_2/loss=73.933, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 506.79it/s, env_step=12288, len=22, n/ep=2, n/st=64, player_1/loss=125.578, player_2/loss=58.464, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 505.51it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=122.219, player_2/loss=84.640, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 500.54it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=141.517, player_2/loss=85.754, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 503.11it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_2/loss=78.557, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 507.51it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=134.954, player_2/loss=94.603, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 498.56it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=132.825, player_2/loss=91.940, rew=-15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 503.68it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=122.660, player_2/loss=116.544, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 497.91it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=106.353, player_2/loss=119.728, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 502.79it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=118.354, player_2/loss=73.393, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 506.37it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=151.368, player_2/loss=81.737, rew=-5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 504.91it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=171.867, player_2/loss=95.581, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 501.81it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=125.847, player_2/loss=112.810, rew=15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 504.32it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=147.679, player_2/loss=106.709, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 506.48it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=170.683, player_2/loss=71.744, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 499.45it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=114.163, player_2/loss=72.056, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 506.54it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=65.698, player_2/loss=102.830, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 504.01it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=80.150, player_2/loss=141.450, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 508.14it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=133.379, player_2/loss=132.171, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 501.06it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=161.974, player_2/loss=137.438, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 506.88it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_2/loss=121.279, rew=25.00]       


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 506.33it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=109.058, player_2/loss=103.609, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 500.94it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=145.808, player_2/loss=143.184, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 502.31it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=189.959, player_2/loss=239.782, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 506.68it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=132.596, player_2/loss=289.147, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 501.86it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=98.194, rew=6.25]          


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.45it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=101.440, rew=6.25]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 501.97it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=83.851, player_2/loss=274.046, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 504.84it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=115.397, player_2/loss=178.696, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.41it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=110.694, player_2/loss=208.971, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 507.49it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=128.351, player_2/loss=180.223, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 506.14it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=148.328, player_2/loss=115.392, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 501.25it/s, env_step=5120, len=10, n/ep=7, n/st=64, player_1/loss=161.724, player_2/loss=121.245, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 504.94it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=193.546, player_2/loss=107.190, rew=-13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 507.97it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=152.708, player_2/loss=83.867, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 506.92it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=99.700, player_2/loss=129.624, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 501.41it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=162.563, player_2/loss=208.281, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 506.56it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=144.172, player_2/loss=186.771, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 508.83it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=159.268, player_2/loss=146.161, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 501.23it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=219.507, player_2/loss=167.638, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 508.31it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=181.559, player_2/loss=127.377, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 505.09it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=214.583, player_2/loss=100.940, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 504.52it/s, env_step=15360, len=20, n/ep=4, n/st=64, player_1/loss=238.614, player_2/loss=103.195, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 506.21it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=219.230, player_2/loss=104.750, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 506.28it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=283.697, player_2/loss=114.873, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 504.33it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_1/loss=264.135, player_2/loss=95.617, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 507.64it/s, env_step=19456, len=26, n/ep=2, n/st=64, player_1/loss=260.926, player_2/loss=101.369, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 507.09it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=147.762, player_2/loss=137.626, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.21it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=142.276, player_2/loss=118.515, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 505.87it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=188.414, player_2/loss=157.788, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 501.19it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=215.158, player_2/loss=308.669, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 503.04it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=159.168, player_2/loss=427.553, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 501.07it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=170.666, player_2/loss=437.943, rew=18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 501.32it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=171.315, player_2/loss=395.630, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 503.06it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=107.989, player_2/loss=345.630, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 496.68it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=26.365, player_2/loss=386.106, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 500.01it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=92.474, player_2/loss=362.471, rew=13.89]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 502.97it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=99.759, player_2/loss=348.810, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 501.17it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=51.021, player_2/loss=346.184, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 493.91it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=39.994, player_2/loss=400.840, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 502.38it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=74.872, player_2/loss=373.823, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 502.17it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=65.755, player_2/loss=381.330, rew=13.89]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 495.59it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=55.597, player_2/loss=308.143, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 503.48it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=89.920, player_2/loss=362.918, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 497.90it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=148.609, player_2/loss=351.869, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.36it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=125.133, player_2/loss=384.960, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 499.39it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=34.773, player_2/loss=320.503, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 507.38it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=92.068, player_2/loss=296.920, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 510.01it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=220.784, player_2/loss=240.536, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 501.93it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=355.632, player_2/loss=132.875, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 505.96it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=354.749, player_2/loss=84.292, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 503.49it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=323.097, player_2/loss=65.629, rew=15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 501.20it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=533.674, player_2/loss=35.661, rew=15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 508.89it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=578.987, player_2/loss=44.358, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 505.50it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=436.728, player_2/loss=40.201, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 503.44it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=383.140, player_2/loss=70.836, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 507.92it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=411.297, player_2/loss=91.260, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 506.53it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=376.812, player_2/loss=70.449, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 507.14it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=333.722, rew=25.00]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 503.16it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=353.830, player_2/loss=81.413, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 504.85it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=394.210, player_2/loss=13.418, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 509.96it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=355.109, player_2/loss=58.021, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 484.24it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=301.658, player_2/loss=63.903, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 498.28it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=373.925, player_2/loss=15.770, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 504.86it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=401.615, player_2/loss=6.871, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 503.00it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=250.900, player_2/loss=14.879, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 503.40it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=249.531, player_2/loss=143.924, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 505.46it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=231.131, player_2/loss=259.763, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 502.38it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=174.891, player_2/loss=320.888, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 497.65it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=150.918, player_2/loss=282.274, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 503.40it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=108.051, player_2/loss=453.693, rew=6.25]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 499.00it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=80.302, player_2/loss=503.826, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 499.87it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=89.318, player_2/loss=464.255, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 507.56it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=137.966, player_2/loss=447.262, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 499.88it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=94.922, player_2/loss=486.364, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 497.90it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=39.003, player_2/loss=514.961, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 507.10it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=61.644, player_2/loss=495.366, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 503.43it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=91.236, player_2/loss=486.491, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 500.36it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=56.597, player_2/loss=485.720, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 499.25it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=32.592, player_2/loss=575.004, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 504.57it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=82.170, player_2/loss=491.447, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 500.19it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=125.109, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 498.33it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=118.942, player_2/loss=412.343, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 504.76it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=87.308, player_2/loss=463.982, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 504.13it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=29.862, player_2/loss=396.321, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 502.97it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=25.114, player_2/loss=357.402, rew=-13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 508.53it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=42.281, player_2/loss=293.504, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 505.22it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=49.389, player_2/loss=271.293, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 499.69it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=51.194, player_2/loss=230.349, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 506.07it/s, env_step=6144, len=10, n/ep=5, n/st=64, player_1/loss=123.868, player_2/loss=163.531, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 506.65it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=151.392, player_2/loss=80.344, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 507.49it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=152.469, player_2/loss=65.466, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 504.68it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=169.875, player_2/loss=37.866, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 509.96it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=186.612, player_2/loss=48.523, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 504.47it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=185.253, player_2/loss=48.316, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 501.72it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=197.868, player_2/loss=8.372, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 505.54it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=176.619, player_2/loss=8.017, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 504.25it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=139.672, player_2/loss=8.523, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 498.66it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=139.101, player_2/loss=93.556, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 506.00it/s, env_step=16384, len=16, n/ep=3, n/st=64, player_1/loss=137.892, player_2/loss=206.001, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 502.20it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=114.977, player_2/loss=196.540, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 503.44it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=78.520, player_2/loss=115.898, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 504.45it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=72.475, player_2/loss=59.230, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 496.70it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=178.790, player_2/loss=227.476, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 499.60it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=163.822, player_2/loss=321.393, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 504.08it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=117.020, player_2/loss=398.573, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 504.77it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=82.607, player_2/loss=374.513, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.95it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=98.149, player_2/loss=360.965, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 501.96it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=152.098, player_2/loss=398.582, rew=6.25]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 502.33it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=192.805, player_2/loss=501.364, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.24it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=144.355, player_2/loss=473.502, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 502.39it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=98.004, player_2/loss=496.002, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 503.68it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=54.787, player_2/loss=488.808, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 502.24it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=66.751, player_2/loss=390.164, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 502.61it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=57.548, player_2/loss=460.261, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 503.78it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=44.986, player_2/loss=515.123, rew=2.78]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.31it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=62.050, player_2/loss=490.519, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 503.43it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=62.071, player_2/loss=438.672, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 501.34it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=57.353, player_2/loss=437.621, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 498.72it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=16.150, player_2/loss=475.615, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 497.19it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=60.476, rew=25.00]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 505.15it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=68.092, player_2/loss=400.470, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 501.24it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=44.338, player_2/loss=422.640, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 503.06it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=32.866, player_2/loss=384.751, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.23it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=16.565, player_2/loss=349.168, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.42it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=236.072, player_2/loss=263.543, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 502.05it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=305.135, player_2/loss=140.871, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 506.85it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=194.694, player_2/loss=86.803, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 502.60it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=231.636, player_2/loss=67.859, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 504.50it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=326.200, player_2/loss=40.658, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 505.60it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=270.612, player_2/loss=46.088, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 500.74it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=221.519, player_2/loss=38.622, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 503.97it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=281.379, player_2/loss=29.668, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 504.11it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=277.190, player_2/loss=104.885, rew=-15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 502.40it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=187.553, player_2/loss=131.578, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 500.61it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=120.792, player_2/loss=73.437, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 508.20it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=67.215, player_2/loss=92.105, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 497.86it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=180.324, player_2/loss=64.000, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 505.76it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=219.978, player_2/loss=28.655, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 503.71it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=272.305, player_2/loss=32.316, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 500.46it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=310.451, player_2/loss=40.665, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 502.07it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=123.610, player_2/loss=109.805, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 502.18it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=100.730, player_2/loss=231.539, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 500.95it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=91.164, player_2/loss=331.336, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 506.56it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=140.285, player_2/loss=351.656, rew=16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 499.59it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=69.631, player_2/loss=357.534, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 495.29it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=42.432, player_2/loss=326.065, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 506.14it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=69.969, player_2/loss=266.951, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 500.51it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=84.062, player_2/loss=294.045, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 496.11it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=78.750, player_2/loss=303.588, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 496.27it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=95.018, player_2/loss=305.674, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 503.32it/s, env_step=11264, len=8, n/ep=9, n/st=64, player_1/loss=112.824, player_2/loss=288.711, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 498.81it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=117.068, player_2/loss=292.226, rew=19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 503.25it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=77.719, player_2/loss=305.450, rew=6.25]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 499.09it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=41.007, player_2/loss=356.688, rew=18.75]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 497.94it/s, env_step=15360, len=7, n/ep=10, n/st=64, player_1/loss=58.477, player_2/loss=307.676, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 502.79it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=66.872, player_2/loss=293.587, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 503.88it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=68.806, player_2/loss=258.779, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 498.41it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=60.613, player_2/loss=271.502, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 505.01it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=99.301, player_2/loss=261.397, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 495.00it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=17.204, player_2/loss=308.116, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.56it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=52.399, player_2/loss=251.261, rew=-15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 481.72it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=69.282, player_2/loss=219.570, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 504.53it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=73.708, player_2/loss=187.077, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 501.73it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=84.048, player_2/loss=180.334, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 502.37it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=184.876, player_2/loss=175.214, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 503.44it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=273.041, player_2/loss=155.089, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 500.77it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=220.575, player_2/loss=183.976, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 504.62it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=177.843, player_2/loss=165.649, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 508.33it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=258.742, player_2/loss=164.435, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 501.77it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=230.291, player_2/loss=158.293, rew=5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 508.88it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=260.775, player_2/loss=155.114, rew=-16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 503.52it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=270.090, player_2/loss=169.040, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 498.82it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=207.679, player_2/loss=199.376, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 504.00it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=188.316, player_2/loss=151.124, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 501.57it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=260.041, player_2/loss=78.869, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 500.62it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=263.268, player_2/loss=61.978, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 506.88it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=245.352, player_2/loss=56.437, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 503.65it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=277.371, player_2/loss=53.921, rew=-15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 499.51it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=210.080, player_2/loss=237.338, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.82it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=191.597, player_2/loss=253.091, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 503.07it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=84.889, player_2/loss=248.014, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 495.52it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=72.611, rew=25.00]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 503.39it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=72.351, player_2/loss=234.495, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.77it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=15.298, player_2/loss=246.746, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 493.56it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=27.363, player_2/loss=289.702, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 498.65it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=31.238, player_2/loss=310.237, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 504.17it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=19.917, player_2/loss=296.244, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 495.02it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=52.163, player_2/loss=291.179, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 500.41it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=63.456, player_2/loss=317.920, rew=18.75]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 499.74it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=61.843, player_2/loss=305.305, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 501.71it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=95.653, player_2/loss=346.730, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 506.89it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=118.136, player_2/loss=323.394, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 504.82it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=71.381, player_2/loss=334.289, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 498.88it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=70.570, player_2/loss=355.705, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 501.72it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=111.029, player_2/loss=295.177, rew=6.25]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 501.60it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=69.870, player_2/loss=233.133, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 500.03it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=69.727, player_2/loss=274.170, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 506.61it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=52.890, player_2/loss=280.475, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 508.12it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=85.407, player_2/loss=251.506, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.83it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=128.239, player_2/loss=193.658, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 503.88it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=184.046, player_2/loss=143.464, rew=-5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 505.90it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=195.701, player_2/loss=107.659, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 505.10it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=196.507, player_2/loss=111.645, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 505.03it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=164.543, player_2/loss=131.841, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 505.61it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=158.007, player_2/loss=90.358, rew=12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 501.36it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=153.473, player_2/loss=34.261, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 497.84it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=135.580, player_2/loss=27.644, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 502.59it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=137.505, player_2/loss=23.236, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 503.49it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=189.943, player_2/loss=56.657, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 507.15it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=157.800, player_2/loss=84.410, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 500.44it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=128.405, player_2/loss=82.184, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 502.02it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_2/loss=41.042, rew=25.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 506.58it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=146.162, player_2/loss=50.650, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 499.53it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=143.069, player_2/loss=55.183, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 506.52it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=174.215, player_2/loss=85.123, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 508.85it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=156.592, player_2/loss=99.561, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 501.70it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=141.953, player_2/loss=133.482, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 505.73it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=147.245, player_2/loss=179.397, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.96it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=80.375, player_2/loss=200.090, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.53it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=86.725, player_2/loss=181.029, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 500.33it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=98.900, player_2/loss=216.223, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 500.80it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=32.242, player_2/loss=262.020, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 500.30it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=20.075, player_2/loss=318.634, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 501.42it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=12.947, player_2/loss=323.034, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 500.95it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=20.362, player_2/loss=275.293, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 498.67it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=39.347, player_2/loss=280.889, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 502.20it/s, env_step=11264, len=7, n/ep=7, n/st=64, player_1/loss=57.636, player_2/loss=287.857, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 498.92it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=51.276, player_2/loss=277.930, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.51it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=43.693, player_2/loss=250.513, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 497.34it/s, env_step=14336, len=7, n/ep=10, n/st=64, player_1/loss=6.935, player_2/loss=279.108, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 502.61it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=4.467, player_2/loss=293.228, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 500.30it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=11.919, player_2/loss=313.910, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 501.08it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=19.402, player_2/loss=334.786, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 501.12it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=9.347, player_2/loss=325.507, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 500.14it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=11.919, player_2/loss=303.629, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 501.28it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=5.542, player_2/loss=292.406, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 504.56it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=125.853, player_2/loss=265.140, rew=18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.51it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=448.553, player_2/loss=178.848, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 504.33it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=612.084, player_2/loss=120.066, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 500.75it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=584.001, player_2/loss=91.938, rew=6.25]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 503.62it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=564.866, player_2/loss=70.526, rew=18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 502.01it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=573.396, player_2/loss=71.266, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 503.89it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=649.819, player_2/loss=69.355, rew=18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 493.16it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=643.799, player_2/loss=25.405, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 503.62it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=681.319, player_2/loss=12.329, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 503.59it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_2/loss=64.605, rew=17.86]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 494.80it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=681.047, player_2/loss=109.768, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 501.25it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=641.222, player_2/loss=85.473, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 503.54it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=622.605, rew=17.86]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 499.73it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=634.432, player_2/loss=55.428, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 502.33it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=631.609, player_2/loss=55.865, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 501.81it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=634.782, player_2/loss=54.475, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 498.43it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=583.707, player_2/loss=60.623, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 502.17it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=622.853, player_2/loss=57.449, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 501.66it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=399.534, player_2/loss=10.923, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.76it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=318.765, player_2/loss=141.051, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.88it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=212.985, player_2/loss=200.209, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.13it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=186.795, player_2/loss=336.688, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.14it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=105.218, player_2/loss=395.230, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 505.35it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=31.420, player_2/loss=388.470, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 502.46it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=23.704, rew=25.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 498.55it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=81.594, player_2/loss=294.788, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 501.48it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=24.440, player_2/loss=324.676, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 504.15it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=11.331, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 500.41it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=9.828, player_2/loss=456.449, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 505.30it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=29.046, player_2/loss=423.596, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.06it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=45.085, player_2/loss=444.044, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 504.36it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=24.534, player_2/loss=414.112, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 506.24it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=6.959, player_2/loss=417.761, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 500.09it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=4.694, player_2/loss=435.919, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 501.85it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=13.870, player_2/loss=395.354, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 503.06it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=17.977, player_2/loss=400.165, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 498.64it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=63.142, player_2/loss=381.066, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 504.05it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=18.965, player_2/loss=246.665, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 502.10it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=14.055, player_2/loss=206.677, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.04it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=16.478, player_2/loss=159.008, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.18it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=46.685, player_2/loss=131.775, rew=-15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 500.45it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=99.162, player_2/loss=134.057, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 502.99it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=85.612, player_2/loss=135.998, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 506.68it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=63.617, player_2/loss=123.656, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 500.62it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=134.200, player_2/loss=111.035, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 500.01it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=221.260, rew=-25.00]         


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 502.03it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=226.352, player_2/loss=132.260, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 499.49it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=220.496, player_2/loss=112.398, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 508.12it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=299.866, player_2/loss=95.186, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 502.64it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=289.856, player_2/loss=74.296, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 501.33it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=222.003, player_2/loss=67.665, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 503.77it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=240.285, player_2/loss=75.161, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 505.79it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=224.201, player_2/loss=65.403, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 500.69it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=208.729, player_2/loss=49.868, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 506.48it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=204.112, player_2/loss=51.105, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 504.88it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=212.571, player_2/loss=14.963, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 498.16it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=57.321, player_2/loss=366.477, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 501.13it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=87.361, player_2/loss=340.029, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 499.77it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=118.419, player_2/loss=411.727, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 496.81it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=145.669, player_2/loss=410.835, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 502.28it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=140.341, player_2/loss=415.659, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 505.14it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=124.794, player_2/loss=439.229, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 497.11it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=102.101, player_2/loss=397.420, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 502.41it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=83.062, player_2/loss=396.440, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 499.38it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=147.014, player_2/loss=438.768, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 496.41it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=141.288, player_2/loss=385.209, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 500.28it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=59.137, player_2/loss=352.175, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 499.77it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=101.125, player_2/loss=406.687, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 499.56it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=132.549, player_2/loss=427.789, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 500.94it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=90.582, player_2/loss=472.958, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 503.38it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=42.293, player_2/loss=428.289, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.23it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=39.244, player_2/loss=427.110, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 500.41it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=52.038, player_2/loss=419.297, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 501.81it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=111.314, rew=19.44]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 496.54it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=87.955, player_2/loss=440.520, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 503.92it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=69.534, player_2/loss=306.009, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 502.38it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=184.366, player_2/loss=174.110, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 502.31it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=351.261, player_2/loss=80.347, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 501.14it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=178.323, player_2/loss=142.600, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 494.79it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=25.843, player_2/loss=183.622, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 502.02it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=29.345, player_2/loss=221.938, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 501.25it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=46.057, player_2/loss=174.872, rew=-13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.71it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=53.500, player_2/loss=147.070, rew=-17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 503.04it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=70.996, player_2/loss=176.336, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 501.91it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=85.625, player_2/loss=226.219, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 501.18it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=114.020, player_2/loss=223.663, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 501.81it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=178.636, player_2/loss=112.160, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 502.58it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=227.032, player_2/loss=105.887, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 501.93it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=207.835, player_2/loss=101.292, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 504.67it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=223.784, player_2/loss=94.408, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 505.39it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=256.164, player_2/loss=28.300, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 498.89it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=229.894, player_2/loss=12.720, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.96it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=183.000, player_2/loss=44.449, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 500.83it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=186.363, player_2/loss=44.660, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 499.05it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=149.521, player_2/loss=185.179, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 504.88it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=133.582, player_2/loss=188.319, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 504.75it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=127.990, player_2/loss=288.165, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 495.72it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=91.513, player_2/loss=332.236, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 499.03it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=42.316, player_2/loss=422.318, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 496.32it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=29.076, rew=16.67]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 500.97it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=25.990, player_2/loss=447.661, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 501.84it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=77.286, player_2/loss=397.639, rew=19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 496.19it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=116.334, player_2/loss=338.632, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 504.13it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=98.189, player_2/loss=354.406, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 504.69it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=105.998, player_2/loss=351.075, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 494.13it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=99.093, player_2/loss=354.766, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 498.73it/s, env_step=13312, len=7, n/ep=10, n/st=64, player_1/loss=52.655, player_2/loss=397.465, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.64it/s, env_step=14336, len=7, n/ep=10, n/st=64, player_1/loss=78.710, player_2/loss=436.206, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 501.26it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=98.260, player_2/loss=536.097, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 500.67it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=78.249, player_2/loss=510.966, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 501.18it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=52.233, player_2/loss=454.211, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 501.65it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=122.535, player_2/loss=405.199, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 497.87it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=103.384, player_2/loss=455.221, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.96it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=120.757, player_2/loss=325.769, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.12it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=109.472, player_2/loss=299.352, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.35it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=95.079, player_2/loss=221.345, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.68it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=140.757, player_2/loss=100.184, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 500.90it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=255.554, player_2/loss=55.265, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 503.48it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=349.695, player_2/loss=40.822, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 497.20it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=330.752, player_2/loss=52.001, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 498.55it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=299.719, player_2/loss=57.963, rew=5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 501.83it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=268.829, player_2/loss=100.779, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 497.61it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=308.709, player_2/loss=99.409, rew=-5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 502.68it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=288.558, player_2/loss=117.122, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 500.11it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=277.635, player_2/loss=109.423, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 501.74it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=269.212, player_2/loss=92.021, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 502.81it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=232.823, player_2/loss=155.708, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 495.19it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=287.448, player_2/loss=54.591, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 503.86it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=375.982, player_2/loss=22.098, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 502.50it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=460.047, player_2/loss=12.553, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 499.06it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=529.944, player_2/loss=11.763, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 503.90it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=485.896, player_2/loss=24.697, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 495.66it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=235.260, player_2/loss=31.237, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 503.77it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=140.369, player_2/loss=41.043, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.59it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=137.499, player_2/loss=68.253, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.57it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=149.984, player_2/loss=81.116, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 505.02it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=95.648, player_2/loss=128.265, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 502.50it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=100.439, player_2/loss=186.567, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 498.37it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=135.092, player_2/loss=222.835, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 503.32it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=122.369, player_2/loss=223.398, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 505.30it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=80.410, player_2/loss=267.218, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 501.07it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=98.651, player_2/loss=254.575, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 503.14it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=124.671, player_2/loss=217.864, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 497.15it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=87.063, player_2/loss=235.368, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 501.90it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=48.161, player_2/loss=212.827, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 501.36it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=47.336, player_2/loss=285.505, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 495.94it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=22.386, player_2/loss=280.970, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 499.56it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=30.996, player_2/loss=282.320, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 500.29it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=19.832, player_2/loss=282.835, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 492.84it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=61.848, player_2/loss=255.661, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 490.26it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=52.819, player_2/loss=263.972, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 496.25it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=59.118, player_2/loss=255.337, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.67it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=143.500, player_2/loss=180.040, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 503.16it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=304.329, player_2/loss=115.326, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 502.93it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=386.672, player_2/loss=82.266, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 500.73it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=351.547, player_2/loss=60.687, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 498.42it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=263.515, player_2/loss=46.824, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 506.53it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=207.688, player_2/loss=31.605, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 502.50it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=201.261, player_2/loss=42.763, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 506.61it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=289.764, player_2/loss=52.542, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 500.02it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=327.825, player_2/loss=34.739, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 500.39it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=255.139, player_2/loss=54.886, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 503.57it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=227.142, player_2/loss=43.935, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 505.40it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=285.201, player_2/loss=21.388, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 502.28it/s, env_step=14336, len=21, n/ep=2, n/st=64, player_1/loss=262.087, player_2/loss=65.510, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 500.04it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=226.620, player_2/loss=71.761, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 503.21it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=292.607, player_2/loss=26.343, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 502.87it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=289.125, player_2/loss=47.268, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 501.82it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=287.362, player_2/loss=78.475, rew=25.00]


Epoch #18: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #18


Epoch #19: 1025it [00:02, 506.14it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=300.466, player_2/loss=158.367, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #18


Epoch #1: 1025it [00:02, 505.02it/s, env_step=1024, len=25, n/ep=2, n/st=64, player_1/loss=180.984, player_2/loss=55.901, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 501.11it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=148.043, player_2/loss=104.598, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 505.97it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=122.595, player_2/loss=266.043, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 494.19it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=81.144, player_2/loss=343.275, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 500.13it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=45.147, player_2/loss=330.408, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 499.86it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=29.792, player_2/loss=283.953, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 501.26it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=38.318, player_2/loss=241.823, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 501.67it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=60.609, player_2/loss=211.868, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 504.02it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=110.166, player_2/loss=270.864, rew=-5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 496.24it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=90.716, player_2/loss=298.614, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 498.84it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=45.602, player_2/loss=307.332, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 500.93it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=36.797, player_2/loss=320.488, rew=19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 497.42it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=60.833, player_2/loss=269.480, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 500.26it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=35.681, player_2/loss=283.415, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 497.63it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=44.722, player_2/loss=279.868, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 496.95it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=47.047, player_2/loss=272.333, rew=18.75]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 497.77it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=59.833, player_2/loss=278.642, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 494.25it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=38.378, player_2/loss=265.368, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 499.98it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=65.545, player_2/loss=251.636, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 502.24it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=20.105, player_2/loss=275.600, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 496.52it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=319.151, player_2/loss=184.261, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 505.72it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=597.536, player_2/loss=106.987, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 504.72it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_1/loss=414.934, player_2/loss=129.185, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.25it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=225.807, player_2/loss=107.637, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 503.67it/s, env_step=6144, len=30, n/ep=2, n/st=64, player_1/loss=266.442, player_2/loss=118.335, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.92it/s, env_step=7168, len=32, n/ep=2, n/st=64, player_1/loss=272.523, rew=25.00]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 503.77it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=129.880, player_2/loss=76.152, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 505.75it/s, env_step=9216, len=28, n/ep=2, n/st=64, player_1/loss=105.131, player_2/loss=52.922, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 499.98it/s, env_step=10240, len=31, n/ep=3, n/st=64, player_1/loss=84.172, player_2/loss=54.182, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 504.01it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=99.419, player_2/loss=67.991, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 504.48it/s, env_step=12288, len=28, n/ep=2, n/st=64, player_1/loss=89.207, player_2/loss=83.130, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 506.42it/s, env_step=13312, len=25, n/ep=3, n/st=64, player_1/loss=99.956, player_2/loss=122.397, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 502.39it/s, env_step=14336, len=27, n/ep=2, n/st=64, player_1/loss=150.775, player_2/loss=100.383, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 502.61it/s, env_step=15360, len=31, n/ep=2, n/st=64, player_1/loss=145.297, player_2/loss=110.757, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 503.49it/s, env_step=16384, len=25, n/ep=3, n/st=64, player_1/loss=110.340, player_2/loss=96.495, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 503.62it/s, env_step=17408, len=31, n/ep=2, n/st=64, player_1/loss=129.566, player_2/loss=81.388, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 502.43it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=169.574, player_2/loss=140.581, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 502.80it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=449.233, player_2/loss=181.381, rew=-18.75]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 494.10it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=20.885, player_2/loss=389.128, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.12it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=18.803, player_2/loss=401.436, rew=18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 500.09it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=40.628, player_2/loss=354.676, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 495.48it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=44.947, player_2/loss=336.738, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.70it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=55.172, player_2/loss=381.554, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.03it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=111.275, player_2/loss=359.783, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 499.66it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=107.212, player_2/loss=296.772, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.49it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=56.930, player_2/loss=339.771, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.73it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=65.218, player_2/loss=381.493, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 500.63it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=51.273, player_2/loss=368.175, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 495.86it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=70.151, player_2/loss=390.674, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.61it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=39.162, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.84it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=18.846, player_2/loss=307.012, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 495.94it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=48.557, player_2/loss=313.439, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 501.03it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=119.029, player_2/loss=377.900, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 499.82it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=100.804, player_2/loss=361.915, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 498.24it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=18.490, player_2/loss=375.628, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 504.04it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=10.577, player_2/loss=361.085, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.46it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=10.470, player_2/loss=394.750, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 504.47it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=16.090, player_2/loss=217.631, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 503.85it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=30.483, player_2/loss=194.527, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.59it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=30.956, rew=-25.00]         


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.48it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=144.884, player_2/loss=153.026, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 505.18it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=68.597, rew=-8.33]          


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 502.07it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=83.680, player_2/loss=117.595, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 501.76it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_2/loss=93.998, rew=25.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 503.94it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=191.277, player_2/loss=52.800, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 502.88it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=199.267, player_2/loss=35.288, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 506.15it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=188.583, rew=15.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 498.31it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=192.459, player_2/loss=24.308, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 500.74it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=178.526, player_2/loss=22.810, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 497.99it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=165.491, player_2/loss=18.809, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 503.14it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_2/loss=46.585, rew=25.00]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 505.45it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=144.611, player_2/loss=57.971, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 502.39it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=164.718, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 504.19it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=173.685, player_2/loss=39.301, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 503.18it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=188.954, player_2/loss=19.817, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 498.63it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=191.607, player_2/loss=41.390, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 504.88it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=121.615, player_2/loss=26.188, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.79it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=130.431, player_2/loss=267.004, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 502.40it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=118.704, player_2/loss=487.118, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 500.61it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=119.049, player_2/loss=556.810, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 495.84it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=87.578, player_2/loss=524.788, rew=13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 502.88it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=88.766, player_2/loss=530.511, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 497.74it/s, env_step=7168, len=7, n/ep=7, n/st=64, player_1/loss=87.427, player_2/loss=454.978, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 494.57it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=118.873, player_2/loss=440.654, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 503.77it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=108.494, player_2/loss=482.333, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 492.55it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=65.412, player_2/loss=494.822, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 503.57it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=98.750, player_2/loss=446.573, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 502.00it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=55.400, player_2/loss=394.435, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.70it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=33.030, player_2/loss=472.054, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.52it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=30.097, player_2/loss=565.017, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.19it/s, env_step=15360, len=8, n/ep=6, n/st=64, player_1/loss=12.940, player_2/loss=484.487, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 496.16it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=81.599, player_2/loss=418.059, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 499.92it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=61.855, player_2/loss=394.589, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 495.77it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=55.353, player_2/loss=418.599, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 498.66it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=43.444, player_2/loss=427.417, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 503.55it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=28.390, player_2/loss=394.768, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.32it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=72.997, player_2/loss=300.157, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 498.86it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=96.542, player_2/loss=252.263, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.93it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=148.331, player_2/loss=188.501, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 499.25it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=143.913, player_2/loss=146.637, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 506.32it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=126.012, player_2/loss=128.875, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 498.61it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=208.348, player_2/loss=142.283, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 500.44it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=241.732, player_2/loss=118.118, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 501.20it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=224.648, player_2/loss=77.813, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 503.64it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=254.513, player_2/loss=96.302, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 504.61it/s, env_step=11264, len=16, n/ep=3, n/st=64, player_1/loss=271.735, player_2/loss=87.355, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 503.44it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=302.268, player_2/loss=43.578, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 499.94it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=431.117, player_2/loss=46.450, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 500.86it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=378.283, player_2/loss=70.187, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 496.29it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=260.166, player_2/loss=80.078, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 503.26it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=238.361, player_2/loss=66.751, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 504.61it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=188.792, player_2/loss=83.156, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 501.01it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=264.171, player_2/loss=67.760, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 505.57it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=398.609, player_2/loss=31.117, rew=-12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 498.21it/s, env_step=1024, len=16, n/ep=3, n/st=64, player_1/loss=134.403, player_2/loss=52.983, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 503.56it/s, env_step=2048, len=16, n/ep=3, n/st=64, player_1/loss=164.671, player_2/loss=59.423, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.68it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=142.394, player_2/loss=42.474, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.00it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=134.608, player_2/loss=57.497, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 503.97it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=115.690, player_2/loss=26.424, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 503.64it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=111.999, player_2/loss=44.314, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.47it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=138.469, player_2/loss=71.478, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 501.74it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=106.164, player_2/loss=86.662, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 500.90it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=64.428, player_2/loss=90.466, rew=-5.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 501.92it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=85.080, player_2/loss=125.457, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 505.81it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=101.641, player_2/loss=88.225, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:02, 499.48it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=64.019, player_2/loss=20.136, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:02, 502.24it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=59.560, player_2/loss=36.169, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:02, 496.11it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=81.508, player_2/loss=34.388, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:02, 505.05it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=48.943, player_2/loss=22.097, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:02, 503.57it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=45.096, player_2/loss=64.482, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:02, 496.18it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=69.568, player_2/loss=93.150, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:02, 500.23it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=74.370, player_2/loss=89.223, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:02, 500.78it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=93.472, player_2/loss=73.699, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:02, 504.05it/s, env_step=1024, len=16, n/ep=3, n/st=64, player_1/loss=82.661, player_2/loss=88.332, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 505.07it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=52.553, player_2/loss=49.426, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 498.95it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=37.987, player_2/loss=12.251, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 504.89it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=78.086, player_2/loss=23.164, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 496.19it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=109.598, player_2/loss=36.854, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 502.29it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=144.260, player_2/loss=60.526, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 503.19it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=159.125, player_2/loss=123.751, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.24it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=159.405, player_2/loss=98.324, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 500.92it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=160.745, player_2/loss=16.475, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 500.58it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=180.284, player_2/loss=44.387, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 475.91it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=162.687, player_2/loss=48.587, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 499.70it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=150.522, player_2/loss=28.912, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 499.61it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=174.501, player_2/loss=25.731, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 502.61it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=165.718, player_2/loss=23.371, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 503.94it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=138.193, player_2/loss=27.149, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 497.66it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=178.244, player_2/loss=110.949, rew=10.71]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.89it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=190.977, player_2/loss=107.585, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 497.15it/s, env_step=18432, len=11, n/ep=7, n/st=64, player_1/loss=198.076, player_2/loss=126.348, rew=17.86]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 500.15it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=234.351, player_2/loss=180.984, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 500.42it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=267.291, player_2/loss=318.412, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.22it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=232.027, player_2/loss=418.887, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 500.81it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_2/loss=523.830, rew=13.89]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 501.43it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=87.479, rew=12.50]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 495.09it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=79.165, player_2/loss=605.795, rew=13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.36it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=83.982, player_2/loss=621.701, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.52it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=105.902, player_2/loss=530.350, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 493.87it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=85.053, player_2/loss=532.823, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 497.30it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=100.008, player_2/loss=503.124, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 498.94it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=138.310, player_2/loss=542.613, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 497.58it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=100.842, player_2/loss=549.637, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 494.70it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=57.112, player_2/loss=599.381, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.77it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=51.867, player_2/loss=564.202, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 499.34it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=67.284, player_2/loss=561.129, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 491.41it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=83.552, player_2/loss=507.995, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 497.95it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=90.852, player_2/loss=582.806, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.29it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=16.326, player_2/loss=574.167, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 498.06it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=70.081, player_2/loss=464.573, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 501.61it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=83.766, player_2/loss=472.211, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 497.45it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=211.428, player_2/loss=371.550, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 507.53it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=211.167, player_2/loss=215.320, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 500.31it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=161.936, player_2/loss=46.603, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 503.96it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=138.316, player_2/loss=12.031, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 505.32it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=152.005, player_2/loss=55.459, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 499.98it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=150.630, player_2/loss=67.150, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 505.17it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=166.256, player_2/loss=33.105, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 502.89it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=156.087, player_2/loss=23.559, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 501.88it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=124.800, player_2/loss=21.450, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 503.26it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=155.216, player_2/loss=21.583, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 501.27it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=168.905, player_2/loss=23.739, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 501.47it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=154.471, player_2/loss=20.426, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 505.28it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=210.296, player_2/loss=13.685, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 500.09it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=273.254, player_2/loss=42.352, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 503.13it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=306.763, player_2/loss=83.822, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 496.10it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=265.141, player_2/loss=72.217, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 503.71it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=298.420, player_2/loss=28.166, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 503.18it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=273.604, player_2/loss=63.593, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.84it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=261.728, player_2/loss=126.387, rew=-12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 490.14it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=122.484, player_2/loss=216.400, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.86it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=93.324, player_2/loss=196.111, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 504.14it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=71.897, player_2/loss=238.722, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 505.76it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=98.944, player_2/loss=259.330, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 499.55it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=107.626, player_2/loss=196.331, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 501.86it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=60.364, player_2/loss=162.821, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.47it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=57.999, player_2/loss=229.221, rew=16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 503.69it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=178.577, player_2/loss=220.965, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 505.53it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=230.193, player_2/loss=169.565, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 503.76it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=121.403, player_2/loss=206.008, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 503.95it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=74.616, player_2/loss=224.714, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 501.82it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=120.967, player_2/loss=222.710, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.84it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=130.739, player_2/loss=227.466, rew=16.67]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 500.94it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=68.133, player_2/loss=238.400, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 497.80it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=39.415, player_2/loss=270.515, rew=-5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 503.94it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=37.974, player_2/loss=241.601, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 499.06it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=57.563, player_2/loss=244.168, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 498.31it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=82.830, player_2/loss=220.171, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 500.52it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=123.392, player_2/loss=266.982, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 495.94it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=71.220, player_2/loss=151.318, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 498.57it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=89.679, player_2/loss=91.532, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 492.74it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=116.441, player_2/loss=39.667, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 500.20it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=111.243, player_2/loss=32.901, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.81it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=88.516, player_2/loss=14.563, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 499.18it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=136.427, player_2/loss=63.190, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 502.77it/s, env_step=7168, len=16, n/ep=3, n/st=64, player_1/loss=147.250, player_2/loss=64.111, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 499.44it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=113.026, player_2/loss=20.503, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 504.39it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=109.431, player_2/loss=28.541, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 500.69it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=91.336, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 501.01it/s, env_step=11264, len=15, n/ep=5, n/st=64, player_1/loss=90.947, player_2/loss=28.447, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 504.06it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=95.632, player_2/loss=5.766, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 496.49it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=96.744, player_2/loss=7.105, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 502.98it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=85.659, player_2/loss=5.860, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 503.59it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=96.311, player_2/loss=4.375, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 498.16it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=110.350, player_2/loss=25.881, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 504.22it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=104.666, player_2/loss=27.482, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 501.29it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=102.078, player_2/loss=5.813, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 498.97it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=106.452, player_2/loss=48.930, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 500.55it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=39.282, player_2/loss=9.338, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.74it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=44.301, player_2/loss=13.655, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 500.68it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=41.729, player_2/loss=17.251, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 495.74it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=67.796, player_2/loss=49.881, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.72it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=91.857, player_2/loss=124.444, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 502.23it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=55.788, player_2/loss=92.636, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 478.19it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=85.312, player_2/loss=25.661, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 500.47it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=66.752, player_2/loss=37.365, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 496.78it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=25.931, player_2/loss=60.408, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 499.31it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=51.880, player_2/loss=86.729, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 501.36it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=54.849, player_2/loss=89.658, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 495.67it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=126.518, player_2/loss=153.857, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 497.66it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=217.127, player_2/loss=236.245, rew=10.71]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 495.79it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=173.491, player_2/loss=275.778, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 490.02it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=108.191, player_2/loss=336.557, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 499.37it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=101.837, player_2/loss=355.972, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 493.90it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=60.230, player_2/loss=328.344, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 497.47it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=84.838, player_2/loss=311.320, rew=6.25]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 494.29it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=101.403, player_2/loss=348.652, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 496.58it/s, env_step=1024, len=29, n/ep=2, n/st=64, player_1/loss=111.771, player_2/loss=401.271, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 501.33it/s, env_step=2048, len=22, n/ep=4, n/st=64, player_1/loss=140.229, player_2/loss=223.795, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.69it/s, env_step=3072, len=24, n/ep=2, n/st=64, player_1/loss=131.841, player_2/loss=68.000, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 502.79it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=108.645, player_2/loss=39.386, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 503.32it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=206.999, player_2/loss=25.038, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 494.85it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=235.923, player_2/loss=63.349, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 501.92it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=235.710, player_2/loss=68.252, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 499.19it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=243.178, player_2/loss=34.340, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 506.30it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=209.724, player_2/loss=52.506, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 497.39it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=290.521, player_2/loss=76.336, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 502.32it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=299.128, player_2/loss=111.488, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 502.07it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=218.992, player_2/loss=88.492, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 497.08it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=177.648, player_2/loss=16.572, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 503.59it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=213.905, player_2/loss=13.594, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 497.38it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=215.460, player_2/loss=28.038, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 504.99it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=245.906, player_2/loss=28.920, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 502.65it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=219.173, player_2/loss=38.053, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 496.41it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=199.255, player_2/loss=38.248, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 502.39it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=249.131, player_2/loss=39.418, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 497.70it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=149.210, player_2/loss=44.105, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 505.72it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=136.661, player_2/loss=32.586, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.61it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=113.430, player_2/loss=65.175, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.22it/s, env_step=4096, len=32, n/ep=2, n/st=64, player_1/loss=85.107, player_2/loss=95.642, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 500.54it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=71.514, player_2/loss=332.837, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.37it/s, env_step=6144, len=29, n/ep=3, n/st=64, player_1/loss=75.394, player_2/loss=330.723, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 500.35it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=254.436, player_2/loss=472.941, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 504.39it/s, env_step=8192, len=31, n/ep=2, n/st=64, player_1/loss=243.233, player_2/loss=270.030, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.13it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=47.857, player_2/loss=176.753, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 500.65it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=30.080, player_2/loss=160.228, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 499.04it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=76.301, player_2/loss=167.892, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 500.86it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=39.639, player_2/loss=176.833, rew=-8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.73it/s, env_step=13312, len=27, n/ep=2, n/st=64, player_1/loss=42.097, player_2/loss=185.507, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 504.40it/s, env_step=14336, len=27, n/ep=2, n/st=64, player_1/loss=37.792, player_2/loss=159.545, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 502.69it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=122.685, player_2/loss=141.584, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 501.31it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=172.443, player_2/loss=134.000, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 501.44it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=144.053, player_2/loss=149.124, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 503.05it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=135.244, player_2/loss=126.651, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 503.34it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=158.334, player_2/loss=113.884, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 498.45it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=131.502, player_2/loss=147.864, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 493.90it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=161.246, player_2/loss=155.791, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 500.26it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=181.909, player_2/loss=119.086, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 499.71it/s, env_step=4096, len=10, n/ep=7, n/st=64, player_1/loss=182.045, player_2/loss=112.327, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 496.54it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=183.834, player_2/loss=119.197, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 501.31it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=152.732, player_2/loss=87.216, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 492.95it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=189.280, player_2/loss=70.731, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 500.93it/s, env_step=8192, len=13, n/ep=6, n/st=64, player_1/loss=210.157, player_2/loss=55.644, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 496.02it/s, env_step=9216, len=10, n/ep=4, n/st=64, player_1/loss=196.290, player_2/loss=34.862, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 500.65it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=224.777, player_2/loss=61.773, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 497.74it/s, env_step=11264, len=16, n/ep=3, n/st=64, player_1/loss=208.448, player_2/loss=139.273, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.80it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=175.110, player_2/loss=151.096, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 502.23it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=199.220, player_2/loss=89.817, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.34it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=185.504, player_2/loss=39.809, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 502.60it/s, env_step=15360, len=12, n/ep=4, n/st=64, player_1/loss=133.194, player_2/loss=42.002, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 497.28it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=161.500, player_2/loss=60.707, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.64it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=167.158, player_2/loss=59.849, rew=10.71]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 501.47it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=167.371, player_2/loss=30.539, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 494.47it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=142.796, player_2/loss=23.969, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 501.66it/s, env_step=1024, len=10, n/ep=7, n/st=64, player_1/loss=175.452, player_2/loss=116.480, rew=3.57]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.43it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=177.896, player_2/loss=193.812, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.39it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=132.071, rew=17.86]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.49it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=134.387, player_2/loss=422.368, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 492.79it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=120.104, player_2/loss=447.733, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.81it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=60.919, player_2/loss=375.451, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 496.45it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=107.633, player_2/loss=423.792, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 495.81it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=140.404, player_2/loss=424.886, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.24it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=133.369, player_2/loss=414.522, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 492.45it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=168.877, player_2/loss=398.179, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 498.24it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=126.735, player_2/loss=383.976, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.86it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=88.823, player_2/loss=319.192, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 496.88it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=61.571, player_2/loss=330.085, rew=3.57]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 502.73it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=73.333, player_2/loss=429.338, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.81it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=89.770, player_2/loss=421.629, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 496.97it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=67.647, player_2/loss=420.830, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 495.30it/s, env_step=17408, len=7, n/ep=7, n/st=64, player_1/loss=54.948, player_2/loss=421.216, rew=3.57]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 497.33it/s, env_step=18432, len=9, n/ep=9, n/st=64, player_1/loss=88.305, player_2/loss=467.101, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 499.08it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=69.381, player_2/loss=490.676, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 495.14it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=329.084, player_2/loss=202.470, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 500.95it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=408.250, player_2/loss=151.398, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 497.78it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=457.803, player_2/loss=78.682, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 499.60it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=448.824, player_2/loss=49.083, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 504.43it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=424.503, player_2/loss=69.738, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 493.75it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=438.475, player_2/loss=95.482, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 503.02it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=543.090, player_2/loss=49.188, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 500.39it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=496.584, player_2/loss=43.959, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 501.48it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=352.186, player_2/loss=153.425, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 496.71it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=137.036, player_2/loss=171.808, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 505.02it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=123.280, player_2/loss=123.034, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 501.28it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=157.096, player_2/loss=122.612, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 500.20it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=154.877, player_2/loss=104.885, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.14it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=128.888, rew=25.00]       


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 504.87it/s, env_step=15360, len=16, n/ep=5, n/st=64, player_1/loss=144.955, rew=5.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 503.71it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=235.306, player_2/loss=82.836, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.65it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=318.537, player_2/loss=81.103, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 497.08it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=305.306, player_2/loss=82.391, rew=-17.86]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 501.06it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=281.574, player_2/loss=69.103, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 496.93it/s, env_step=1024, len=29, n/ep=2, n/st=64, player_1/loss=179.402, player_2/loss=78.153, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 502.22it/s, env_step=2048, len=30, n/ep=2, n/st=64, player_1/loss=132.494, player_2/loss=80.080, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 504.08it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=102.046, player_2/loss=266.674, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.73it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=92.235, player_2/loss=372.094, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 501.36it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=114.551, player_2/loss=429.744, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 503.10it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=129.982, player_2/loss=545.832, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 498.24it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=80.738, player_2/loss=588.914, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.35it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=92.187, player_2/loss=486.670, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 499.86it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=80.860, player_2/loss=538.107, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 491.30it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=44.148, player_2/loss=440.706, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 497.80it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=41.007, player_2/loss=451.325, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 498.77it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=69.513, player_2/loss=440.688, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.33it/s, env_step=13312, len=8, n/ep=9, n/st=64, player_1/loss=43.494, player_2/loss=440.630, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.46it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=25.999, player_2/loss=463.380, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 498.25it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=21.572, player_2/loss=489.236, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 493.84it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=48.219, player_2/loss=503.725, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 499.70it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=20.973, player_2/loss=551.487, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 496.94it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=16.331, player_2/loss=573.420, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 495.28it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=9.122, player_2/loss=489.415, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 497.98it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=27.297, player_2/loss=284.018, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.89it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=31.588, player_2/loss=302.329, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.46it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=29.871, player_2/loss=291.062, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.19it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=122.596, player_2/loss=260.079, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.75it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=187.745, player_2/loss=195.531, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 501.82it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=213.873, player_2/loss=172.189, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 502.55it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=209.043, player_2/loss=158.768, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 500.50it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=147.884, player_2/loss=167.124, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 497.13it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=209.754, player_2/loss=144.861, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 502.57it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=330.695, player_2/loss=85.941, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 497.67it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=331.172, player_2/loss=59.613, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 499.10it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=217.411, player_2/loss=117.346, rew=-5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.87it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=222.550, player_2/loss=223.939, rew=-5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 500.11it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=284.191, player_2/loss=185.132, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 502.19it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=276.396, player_2/loss=122.521, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 501.48it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=273.116, player_2/loss=99.299, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 499.28it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=325.985, player_2/loss=66.745, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 488.74it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=281.505, player_2/loss=74.378, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 500.58it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=197.526, player_2/loss=41.743, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 497.00it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=192.421, player_2/loss=63.759, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.55it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=109.798, player_2/loss=49.392, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 501.95it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=105.443, player_2/loss=137.676, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 497.79it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=103.953, player_2/loss=126.470, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 501.05it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=92.968, player_2/loss=151.448, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 474.53it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=124.841, player_2/loss=214.036, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 495.91it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=125.149, player_2/loss=239.622, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 502.51it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=108.093, player_2/loss=281.179, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 497.50it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=51.373, player_2/loss=286.128, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.75it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=104.919, player_2/loss=309.266, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 494.64it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=147.531, rew=25.00]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 492.20it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=65.974, player_2/loss=352.415, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.24it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=68.173, player_2/loss=333.588, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 496.58it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=17.147, player_2/loss=306.876, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 494.77it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=30.817, player_2/loss=333.239, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 492.51it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=56.419, player_2/loss=341.265, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 495.60it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=47.178, player_2/loss=371.235, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 493.99it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=28.129, player_2/loss=388.352, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 495.50it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=15.850, player_2/loss=404.408, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 497.11it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=76.716, player_2/loss=294.571, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.07it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=68.193, player_2/loss=262.143, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.70it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=124.589, player_2/loss=266.567, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 502.38it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=95.993, player_2/loss=281.715, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.61it/s, env_step=5120, len=7, n/ep=7, n/st=64, player_1/loss=33.635, player_2/loss=241.050, rew=-17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 494.11it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=32.798, player_2/loss=205.419, rew=-17.86]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 500.25it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=157.027, player_2/loss=168.666, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 492.84it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=223.122, player_2/loss=188.414, rew=-17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 500.00it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=88.925, player_2/loss=227.286, rew=-16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 496.36it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=69.967, player_2/loss=265.268, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 498.80it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=164.318, rew=25.00]       


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 498.08it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=263.904, player_2/loss=140.911, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 497.14it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=254.385, player_2/loss=87.270, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 496.24it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=235.137, player_2/loss=81.798, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 496.60it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=208.751, player_2/loss=82.676, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 496.00it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=194.285, player_2/loss=69.516, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 498.13it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=208.184, player_2/loss=47.735, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 500.75it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=191.564, player_2/loss=51.800, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 496.82it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=170.379, player_2/loss=104.565, rew=5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 500.88it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=131.598, player_2/loss=156.306, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 502.66it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=92.619, player_2/loss=162.008, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 499.19it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=60.286, player_2/loss=257.572, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 496.14it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=124.537, player_2/loss=317.583, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.41it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=127.751, player_2/loss=431.677, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 490.22it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=100.741, player_2/loss=456.728, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 497.89it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=48.316, player_2/loss=417.684, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.16it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=94.938, player_2/loss=495.248, rew=19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.57it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=127.849, player_2/loss=445.686, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 498.50it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=77.336, player_2/loss=480.320, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.37it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=37.355, player_2/loss=406.163, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 500.00it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=16.568, player_2/loss=434.271, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 499.39it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=57.209, player_2/loss=482.624, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.46it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=104.603, player_2/loss=513.208, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 497.28it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=67.464, player_2/loss=513.175, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 497.07it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=84.826, player_2/loss=417.023, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.40it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=43.530, player_2/loss=451.992, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.44it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=49.540, player_2/loss=444.832, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 499.75it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=98.711, player_2/loss=430.661, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 497.17it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=65.287, player_2/loss=369.810, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.27it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=49.365, player_2/loss=341.212, rew=-13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 503.48it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=46.492, player_2/loss=304.163, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 497.24it/s, env_step=4096, len=7, n/ep=7, n/st=64, player_1/loss=31.363, player_2/loss=243.643, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 496.14it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=26.734, player_2/loss=204.056, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 500.14it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=29.337, player_2/loss=197.081, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 493.30it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=44.354, player_2/loss=251.180, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 502.06it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=37.137, player_2/loss=264.886, rew=-19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 501.49it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=85.000, player_2/loss=256.257, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 497.02it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=90.954, player_2/loss=247.447, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 498.11it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=124.239, player_2/loss=220.350, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 497.78it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=219.152, player_2/loss=168.222, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 500.15it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=316.217, player_2/loss=83.829, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 495.52it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=368.999, player_2/loss=37.094, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 499.61it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=278.961, player_2/loss=76.393, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 500.33it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=308.499, player_2/loss=108.329, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 500.34it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=358.501, player_2/loss=76.421, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 495.93it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=356.773, player_2/loss=90.156, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 500.01it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=346.638, player_2/loss=106.455, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 499.62it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=273.700, player_2/loss=82.679, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.63it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=230.772, player_2/loss=113.860, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.81it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=168.411, player_2/loss=89.918, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 501.68it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=145.379, player_2/loss=25.189, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.61it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=126.032, player_2/loss=99.055, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.90it/s, env_step=6144, len=32, n/ep=2, n/st=64, player_1/loss=97.631, player_2/loss=167.701, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 493.66it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=85.593, player_2/loss=154.844, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 500.48it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=84.340, player_2/loss=181.539, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 502.63it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=95.373, player_2/loss=172.502, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 496.74it/s, env_step=10240, len=24, n/ep=2, n/st=64, player_1/loss=95.659, player_2/loss=134.921, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 494.91it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=66.425, player_2/loss=145.806, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 497.28it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=29.968, player_2/loss=131.856, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 494.75it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=70.610, player_2/loss=132.334, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 499.21it/s, env_step=14336, len=15, n/ep=5, n/st=64, player_1/loss=95.812, player_2/loss=171.341, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 499.04it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=122.496, player_2/loss=139.198, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 492.50it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=98.042, player_2/loss=140.737, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 498.37it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=44.142, player_2/loss=192.020, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 494.70it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=33.229, player_2/loss=182.114, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 497.24it/s, env_step=19456, len=25, n/ep=2, n/st=64, player_1/loss=72.367, rew=0.00]         


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 502.46it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=78.688, player_2/loss=112.554, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.31it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=78.667, player_2/loss=125.929, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 499.14it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=114.972, player_2/loss=140.512, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 500.22it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=123.342, player_2/loss=141.825, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 496.61it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=101.344, player_2/loss=130.259, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 501.20it/s, env_step=6144, len=24, n/ep=2, n/st=64, player_1/loss=157.094, player_2/loss=122.754, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 494.69it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_2/loss=84.741, rew=-8.33]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 498.27it/s, env_step=8192, len=24, n/ep=2, n/st=64, player_1/loss=116.794, player_2/loss=53.950, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 498.70it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=126.670, player_2/loss=81.803, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 497.39it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=139.782, player_2/loss=61.484, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 500.48it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=114.548, player_2/loss=82.660, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 501.52it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=82.420, player_2/loss=146.501, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 500.35it/s, env_step=13312, len=24, n/ep=3, n/st=64, player_1/loss=59.409, player_2/loss=108.898, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 497.27it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=82.320, player_2/loss=56.862, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 503.12it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=135.249, player_2/loss=33.586, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 502.19it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=113.341, player_2/loss=38.691, rew=-12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 493.92it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=88.201, player_2/loss=47.202, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 501.11it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=119.418, player_2/loss=59.742, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 499.71it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=159.269, player_2/loss=52.213, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 493.31it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=240.077, player_2/loss=330.719, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.09it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=162.909, rew=19.44]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 498.36it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=52.583, player_2/loss=608.351, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 491.96it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=54.188, player_2/loss=507.589, rew=6.25]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 496.50it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=80.239, player_2/loss=401.971, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 488.83it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=90.819, player_2/loss=446.761, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.98it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=69.659, player_2/loss=510.353, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 494.16it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=46.151, player_2/loss=636.287, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 491.74it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=54.369, player_2/loss=581.956, rew=19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 495.37it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=92.625, player_2/loss=475.956, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 488.46it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=120.634, player_2/loss=471.941, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 493.21it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=115.864, player_2/loss=506.938, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 497.91it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=65.110, player_2/loss=541.092, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.73it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=40.936, player_2/loss=516.122, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 496.32it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=12.716, player_2/loss=589.864, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.88it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=83.804, player_2/loss=565.495, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.08it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=113.738, player_2/loss=406.765, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.18it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=69.804, player_2/loss=425.754, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 496.93it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=79.186, player_2/loss=519.124, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 494.61it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=82.460, player_2/loss=394.912, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.00it/s, env_step=2048, len=9, n/ep=9, n/st=64, player_1/loss=65.828, player_2/loss=347.582, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.20it/s, env_step=3072, len=30, n/ep=2, n/st=64, player_1/loss=109.509, player_2/loss=267.161, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 501.08it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=103.477, player_2/loss=214.874, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 500.28it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=81.596, player_2/loss=126.149, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 499.34it/s, env_step=6144, len=30, n/ep=2, n/st=64, player_1/loss=96.309, player_2/loss=134.557, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 498.46it/s, env_step=7168, len=25, n/ep=2, n/st=64, player_1/loss=138.954, player_2/loss=137.701, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 500.17it/s, env_step=8192, len=25, n/ep=3, n/st=64, player_1/loss=131.075, player_2/loss=163.216, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 485.71it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=104.580, player_2/loss=144.360, rew=-12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 502.52it/s, env_step=10240, len=22, n/ep=2, n/st=64, player_1/loss=112.925, player_2/loss=75.114, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 498.55it/s, env_step=11264, len=25, n/ep=3, n/st=64, player_1/loss=140.971, player_2/loss=112.849, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 497.56it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=184.694, player_2/loss=104.698, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 502.49it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=159.841, player_2/loss=109.729, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 500.36it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=121.985, player_2/loss=103.509, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 499.19it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=108.240, player_2/loss=70.711, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 495.99it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=136.715, player_2/loss=59.275, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 497.78it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=133.847, player_2/loss=24.059, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 497.42it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=130.334, player_2/loss=12.461, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 492.73it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=151.485, player_2/loss=31.873, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 500.41it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=127.330, player_2/loss=64.892, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.62it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=118.932, player_2/loss=52.198, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.62it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=92.904, player_2/loss=60.342, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.92it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=79.843, player_2/loss=35.688, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 504.04it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=65.223, player_2/loss=43.249, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.32it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=83.023, player_2/loss=81.242, rew=-16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.85it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=115.081, player_2/loss=115.472, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 499.73it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=167.820, player_2/loss=97.530, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 496.37it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_2/loss=210.644, rew=18.75]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 496.84it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=149.932, player_2/loss=318.586, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 497.82it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=93.484, player_2/loss=388.531, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 498.08it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=95.297, player_2/loss=332.572, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 494.64it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=87.527, player_2/loss=275.959, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 497.96it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=92.179, player_2/loss=288.187, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 498.35it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=90.461, player_2/loss=311.865, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 497.49it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=110.831, player_2/loss=322.934, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 499.97it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=152.194, player_2/loss=313.979, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 497.95it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=87.702, player_2/loss=321.205, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 493.94it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=20.169, player_2/loss=360.205, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 500.98it/s, env_step=1024, len=16, n/ep=5, n/st=64, player_1/loss=47.527, player_2/loss=319.294, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 502.29it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=146.345, player_2/loss=205.738, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 504.90it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=207.600, player_2/loss=117.605, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 495.20it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=175.079, player_2/loss=113.708, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 502.16it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=141.257, player_2/loss=79.474, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 501.12it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=146.593, player_2/loss=26.469, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 498.59it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=142.677, player_2/loss=32.553, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 498.83it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=121.979, player_2/loss=60.134, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 497.25it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=156.507, player_2/loss=106.594, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 501.64it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=157.165, player_2/loss=124.787, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 502.49it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=126.818, player_2/loss=109.823, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 497.11it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=113.942, player_2/loss=70.616, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 503.14it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=108.423, player_2/loss=19.953, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 500.03it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=112.900, player_2/loss=15.565, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 500.31it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=93.813, player_2/loss=12.944, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 501.70it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=116.233, player_2/loss=67.303, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.65it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=131.143, player_2/loss=69.603, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 501.27it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=143.578, player_2/loss=85.129, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 503.71it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=161.045, player_2/loss=103.173, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 496.49it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=110.075, player_2/loss=209.950, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 500.93it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=163.604, player_2/loss=220.733, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 495.87it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=172.823, player_2/loss=307.768, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.04it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=135.222, player_2/loss=343.819, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 500.67it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=99.572, player_2/loss=269.210, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.90it/s, env_step=6144, len=9, n/ep=8, n/st=64, player_1/loss=94.148, player_2/loss=207.575, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 499.02it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=70.988, player_2/loss=229.534, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 496.90it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=38.912, player_2/loss=244.837, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 492.11it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=18.794, player_2/loss=269.715, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.08it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=23.008, player_2/loss=268.392, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 497.22it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=51.903, player_2/loss=238.791, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.88it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=41.928, player_2/loss=239.588, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 499.73it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=11.509, player_2/loss=235.692, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 493.55it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=9.394, player_2/loss=231.245, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 499.04it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=10.362, player_2/loss=230.841, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 500.40it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=71.359, player_2/loss=242.697, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.23it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=86.626, player_2/loss=208.327, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.16it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=31.344, player_2/loss=232.815, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 494.82it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=17.441, player_2/loss=295.237, rew=17.86]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 492.76it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=291.711, player_2/loss=240.074, rew=3.57]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.49it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=398.602, player_2/loss=209.518, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.53it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=480.662, player_2/loss=127.176, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.00it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=570.451, player_2/loss=81.697, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.22it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=666.221, player_2/loss=95.489, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.15it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=675.963, player_2/loss=162.842, rew=10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.32it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=557.610, player_2/loss=180.027, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.45it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=523.019, player_2/loss=98.635, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 497.65it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=695.834, player_2/loss=54.115, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.56it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=648.551, player_2/loss=51.084, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 497.97it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=752.024, player_2/loss=72.461, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 495.83it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=696.776, player_2/loss=124.637, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.81it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=503.573, player_2/loss=93.797, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 497.28it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=416.777, player_2/loss=90.944, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 495.03it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=506.193, player_2/loss=65.521, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 492.89it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=593.459, player_2/loss=43.793, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 496.08it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=663.737, player_2/loss=56.854, rew=10.71]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.64it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=708.260, player_2/loss=96.633, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 488.41it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=615.642, player_2/loss=100.005, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 494.40it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=690.732, player_2/loss=111.480, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.81it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=481.046, player_2/loss=75.765, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.88it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=308.259, player_2/loss=106.799, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 489.89it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=249.376, player_2/loss=103.933, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 496.29it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=187.459, player_2/loss=104.143, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 500.07it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=125.039, player_2/loss=206.599, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 492.63it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=62.253, player_2/loss=331.604, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 498.78it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=26.239, player_2/loss=477.450, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 500.06it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=15.582, player_2/loss=466.882, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 488.73it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=11.635, player_2/loss=460.414, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 496.75it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=7.709, player_2/loss=554.025, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 493.33it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=12.025, player_2/loss=471.872, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 495.08it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=13.288, player_2/loss=457.939, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 484.85it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=5.608, player_2/loss=531.935, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 480.41it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=6.757, player_2/loss=490.587, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 496.68it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=7.872, player_2/loss=449.770, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 496.79it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=4.103, player_2/loss=418.924, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 495.67it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=3.866, player_2/loss=456.475, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 494.38it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=6.017, player_2/loss=506.767, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 496.13it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=6.998, player_2/loss=363.728, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.05it/s, env_step=2048, len=42, n/ep=1, n/st=64, player_1/loss=97.770, player_2/loss=252.533, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.52it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=185.007, player_2/loss=166.722, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.70it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=116.239, player_2/loss=143.848, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 493.65it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=64.943, player_2/loss=140.435, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 498.50it/s, env_step=6144, len=21, n/ep=2, n/st=64, player_1/loss=88.109, player_2/loss=97.916, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 498.31it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=158.177, rew=-12.50]        


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 493.64it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=179.605, player_2/loss=81.306, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 497.68it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=343.619, player_2/loss=51.773, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 498.98it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=265.128, player_2/loss=66.269, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 494.74it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=97.051, player_2/loss=101.491, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 500.62it/s, env_step=12288, len=22, n/ep=2, n/st=64, player_1/loss=166.602, player_2/loss=97.356, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 500.63it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=309.097, player_2/loss=88.224, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 496.66it/s, env_step=14336, len=23, n/ep=3, n/st=64, player_1/loss=330.721, player_2/loss=54.610, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 493.82it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=243.888, player_2/loss=62.733, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 499.56it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=163.780, player_2/loss=73.865, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 501.57it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=140.570, player_2/loss=86.135, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 495.14it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=224.853, player_2/loss=39.820, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 498.28it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=236.380, player_2/loss=23.289, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 493.99it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=134.103, player_2/loss=137.874, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 493.64it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=82.814, player_2/loss=246.314, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.58it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=54.678, rew=25.00]           


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 491.46it/s, env_step=4096, len=8, n/ep=9, n/st=64, player_1/loss=38.226, player_2/loss=368.500, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.10it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=33.357, player_2/loss=352.840, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 497.64it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=23.802, player_2/loss=330.766, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 495.89it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=11.792, player_2/loss=341.870, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 491.72it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=4.870, player_2/loss=356.645, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 491.30it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=95.980, player_2/loss=363.541, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 497.08it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=79.052, player_2/loss=356.552, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 497.34it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=16.759, player_2/loss=398.173, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.47it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=37.040, player_2/loss=385.711, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.24it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=65.825, player_2/loss=422.824, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 495.10it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=65.342, player_2/loss=396.073, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 495.17it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=30.044, player_2/loss=360.058, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 490.42it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=4.326, player_2/loss=381.756, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 500.24it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=4.350, player_2/loss=361.351, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 497.14it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=26.698, player_2/loss=357.839, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 482.39it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=64.039, player_2/loss=362.957, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 493.15it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=26.135, player_2/loss=233.763, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.50it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=142.888, player_2/loss=177.788, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 494.50it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=190.936, player_2/loss=83.871, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 497.00it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=148.607, player_2/loss=69.480, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 500.47it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=141.285, player_2/loss=114.545, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 494.90it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=157.869, player_2/loss=187.179, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 493.01it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=234.894, player_2/loss=194.319, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 491.45it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=205.530, player_2/loss=216.353, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 499.45it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=142.058, player_2/loss=188.189, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 498.87it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=143.055, player_2/loss=71.936, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 494.67it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=145.074, player_2/loss=99.000, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 503.67it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=177.818, player_2/loss=89.131, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 491.18it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=181.007, player_2/loss=59.250, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 498.62it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=131.508, player_2/loss=106.904, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 498.89it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=110.802, player_2/loss=135.986, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 498.70it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=207.581, player_2/loss=97.748, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 497.68it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=200.026, player_2/loss=77.551, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 500.52it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=178.848, player_2/loss=53.501, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 495.82it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=168.284, player_2/loss=51.898, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 494.38it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=112.570, player_2/loss=171.081, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.73it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=85.051, player_2/loss=178.388, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 493.76it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=52.446, player_2/loss=263.933, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 499.86it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=43.335, player_2/loss=275.295, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 496.21it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=50.817, player_2/loss=225.504, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 496.30it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=58.227, player_2/loss=222.754, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 496.60it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=41.630, player_2/loss=260.733, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 490.14it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=29.976, player_2/loss=322.893, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 496.75it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=18.643, player_2/loss=310.468, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 495.42it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=29.554, player_2/loss=303.422, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.40it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=23.824, player_2/loss=303.835, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 492.02it/s, env_step=12288, len=7, n/ep=10, n/st=64, player_1/loss=19.834, player_2/loss=329.374, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.35it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=22.620, player_2/loss=327.950, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.39it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=8.960, player_2/loss=329.978, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.68it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=3.207, player_2/loss=353.751, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.81it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=7.736, player_2/loss=345.776, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 493.17it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=9.558, player_2/loss=310.129, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 496.35it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=6.717, player_2/loss=297.786, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 492.75it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=5.925, player_2/loss=288.116, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 499.91it/s, env_step=1024, len=10, n/ep=7, n/st=64, player_1/loss=7.323, player_2/loss=258.011, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.87it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=9.027, player_2/loss=293.394, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.94it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=20.904, player_2/loss=271.946, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.87it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=35.706, player_2/loss=202.318, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.01it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=84.895, player_2/loss=183.435, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 491.34it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=79.828, player_2/loss=165.019, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 498.51it/s, env_step=7168, len=16, n/ep=3, n/st=64, player_1/loss=30.447, player_2/loss=158.606, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 498.82it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=142.892, player_2/loss=139.624, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 504.25it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=225.356, player_2/loss=111.436, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #10: 1025it [00:02, 499.28it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=225.803, player_2/loss=79.224, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #11: 1025it [00:02, 496.79it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=266.995, player_2/loss=85.545, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #12: 1025it [00:02, 493.42it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=375.577, player_2/loss=62.006, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #13: 1025it [00:02, 499.87it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=409.952, player_2/loss=61.726, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #14: 1025it [00:02, 501.14it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=370.132, player_2/loss=69.186, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #15: 1025it [00:02, 496.13it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=282.086, player_2/loss=69.498, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #16: 1025it [00:02, 500.45it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=202.581, player_2/loss=54.965, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #17: 1025it [00:02, 500.75it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=252.374, player_2/loss=39.623, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #18: 1025it [00:02, 493.10it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=350.721, player_2/loss=15.510, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #19: 1025it [00:02, 500.87it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=323.891, player_2/loss=11.373, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #1: 1025it [00:02, 492.94it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=256.844, player_2/loss=24.491, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.04it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=165.361, player_2/loss=214.333, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 495.82it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=104.109, player_2/loss=300.608, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 498.66it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=98.096, player_2/loss=351.427, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 498.77it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=63.696, player_2/loss=334.070, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 493.68it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=88.115, player_2/loss=331.732, rew=13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 488.34it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=77.081, player_2/loss=305.696, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 493.36it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=89.913, player_2/loss=318.790, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 496.02it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=88.655, player_2/loss=345.176, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 488.77it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=9.969, player_2/loss=329.087, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 491.79it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=29.809, player_2/loss=327.507, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 491.19it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=127.749, player_2/loss=351.855, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.17it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=129.882, player_2/loss=393.263, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 495.36it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=71.671, player_2/loss=406.991, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.75it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=90.947, player_2/loss=361.454, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 489.47it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=9.194, player_2/loss=353.495, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 490.52it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=19.047, player_2/loss=378.334, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 495.18it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=52.199, player_2/loss=397.651, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 498.50it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=44.096, player_2/loss=449.182, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 489.75it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=14.230, player_2/loss=283.713, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.31it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=40.583, player_2/loss=220.621, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.81it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=55.622, player_2/loss=142.474, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.57it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=71.701, player_2/loss=159.868, rew=-16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 499.76it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=99.612, rew=-25.00]         


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 500.28it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=121.481, player_2/loss=138.398, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 493.92it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=88.052, player_2/loss=141.360, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 497.40it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=99.308, player_2/loss=149.890, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 497.38it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=147.726, player_2/loss=114.461, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 496.80it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=168.757, player_2/loss=96.617, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 500.24it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=142.052, player_2/loss=75.098, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 494.75it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=129.694, player_2/loss=81.880, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 495.52it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=202.000, player_2/loss=110.405, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 499.02it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=244.042, player_2/loss=99.576, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 494.20it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=175.639, player_2/loss=83.879, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 496.98it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=136.192, player_2/loss=84.175, rew=-12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 500.09it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=162.903, player_2/loss=52.084, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 493.58it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=183.569, rew=25.00]       


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 497.83it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=217.011, player_2/loss=44.098, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 497.32it/s, env_step=1024, len=21, n/ep=4, n/st=64, player_1/loss=121.453, player_2/loss=69.337, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.63it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=138.065, player_2/loss=48.187, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 500.25it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=174.687, player_2/loss=36.471, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 497.59it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=138.744, player_2/loss=134.404, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 490.61it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=60.927, player_2/loss=199.217, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 490.30it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=52.832, player_2/loss=286.786, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 493.22it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=25.381, rew=17.86]           


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 492.33it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=32.013, player_2/loss=289.251, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 496.49it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=43.497, player_2/loss=265.019, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.06it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=18.368, player_2/loss=302.178, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 490.12it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=9.050, player_2/loss=300.879, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 489.95it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=28.550, player_2/loss=311.074, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 492.76it/s, env_step=13312, len=9, n/ep=5, n/st=64, player_1/loss=20.096, player_2/loss=263.235, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 489.79it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=19.864, player_2/loss=278.523, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.09it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=28.449, player_2/loss=284.951, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 493.41it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=28.766, player_2/loss=253.874, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 491.56it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=39.342, player_2/loss=238.343, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 494.77it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=32.227, player_2/loss=290.587, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 486.93it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=33.137, player_2/loss=322.118, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 497.52it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=55.154, player_2/loss=214.676, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.95it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=53.982, player_2/loss=223.759, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.64it/s, env_step=3072, len=13, n/ep=3, n/st=64, player_1/loss=66.088, player_2/loss=180.317, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.03it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=84.628, player_2/loss=139.287, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.35it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=152.959, player_2/loss=199.191, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 493.06it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=158.727, player_2/loss=256.679, rew=-19.44]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 496.12it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=158.445, player_2/loss=225.701, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 493.35it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=217.589, player_2/loss=173.859, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 493.79it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=274.999, player_2/loss=118.289, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 492.64it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=190.270, player_2/loss=158.121, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 495.57it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=236.030, player_2/loss=227.349, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 493.05it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=305.106, player_2/loss=149.255, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 493.50it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=270.334, player_2/loss=98.918, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 499.89it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=247.256, player_2/loss=81.443, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 493.91it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=260.303, player_2/loss=82.340, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 496.18it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=238.814, player_2/loss=75.470, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 495.47it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=205.329, player_2/loss=59.777, rew=12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 492.94it/s, env_step=18432, len=12, n/ep=4, n/st=64, player_1/loss=224.024, player_2/loss=71.452, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 494.79it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=295.900, player_2/loss=55.207, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 492.86it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=145.376, player_2/loss=180.950, rew=-15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 499.88it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=98.307, player_2/loss=208.103, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 489.05it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=45.485, player_2/loss=336.617, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 493.74it/s, env_step=4096, len=7, n/ep=7, n/st=64, player_2/loss=357.598, rew=10.71]          


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 492.39it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=15.024, player_2/loss=356.857, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 491.62it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=79.107, player_2/loss=341.451, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 489.52it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=64.335, player_2/loss=379.202, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 494.16it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=73.512, player_2/loss=361.243, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 490.49it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=97.752, player_2/loss=357.068, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 485.40it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=58.707, player_2/loss=347.518, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 478.76it/s, env_step=11264, len=8, n/ep=9, n/st=64, player_1/loss=74.184, player_2/loss=300.476, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.67it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_2/loss=269.576, rew=19.44]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 488.62it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=41.615, player_2/loss=331.047, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 494.64it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=52.369, player_2/loss=343.201, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 488.83it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=101.135, player_2/loss=325.628, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 494.05it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=111.558, player_2/loss=329.556, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.12it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=104.563, player_2/loss=306.971, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 497.05it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=71.668, player_2/loss=312.338, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 494.86it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=75.304, player_2/loss=354.960, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 491.89it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=6.560, player_2/loss=242.420, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.29it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=17.340, player_2/loss=238.445, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.50it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=52.658, player_2/loss=179.388, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.65it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=36.452, player_2/loss=147.534, rew=-13.89]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.80it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=88.917, player_2/loss=117.608, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 496.46it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=31.972, player_2/loss=124.267, rew=-17.86]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 495.27it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=108.834, player_2/loss=139.914, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 499.23it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=206.168, player_2/loss=137.852, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 495.64it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=251.691, player_2/loss=78.071, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 488.42it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=294.933, player_2/loss=39.652, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 496.93it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=300.195, player_2/loss=22.001, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 500.78it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=276.216, rew=25.00]       


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 494.07it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=264.959, player_2/loss=40.344, rew=16.67]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 496.16it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=272.238, player_2/loss=9.422, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 499.83it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=302.695, player_2/loss=9.017, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 497.95it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=256.391, player_2/loss=153.550, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 486.16it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=188.139, player_2/loss=251.096, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 498.46it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=114.351, player_2/loss=161.124, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 502.23it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=94.136, player_2/loss=145.002, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 493.41it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=136.766, player_2/loss=95.869, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 504.14it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=134.760, player_2/loss=133.117, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.69it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=98.427, player_2/loss=195.817, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 492.95it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=29.864, rew=17.86]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 492.22it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=28.541, player_2/loss=215.375, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 490.48it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=33.397, player_2/loss=222.398, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 490.38it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=34.152, player_2/loss=260.667, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 492.11it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=21.896, player_2/loss=271.550, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 492.28it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=16.044, player_2/loss=250.127, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 490.14it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=11.127, player_2/loss=273.531, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.41it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=7.405, player_2/loss=296.722, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 499.71it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=6.158, player_2/loss=303.449, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 487.25it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=53.850, player_2/loss=294.177, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 494.14it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=54.569, player_2/loss=253.509, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 497.07it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=43.044, player_2/loss=232.271, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 500.32it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=82.424, player_2/loss=247.261, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.03it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=53.027, player_2/loss=217.722, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 499.54it/s, env_step=18432, len=11, n/ep=7, n/st=64, player_1/loss=68.681, player_2/loss=219.459, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 490.54it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=66.880, player_2/loss=216.079, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.46it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=4.232, player_2/loss=155.928, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.99it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=4.921, player_2/loss=125.148, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.01it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=36.495, player_2/loss=113.810, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.61it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=73.437, player_2/loss=138.061, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.54it/s, env_step=5120, len=29, n/ep=2, n/st=64, player_1/loss=99.799, player_2/loss=113.632, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 501.77it/s, env_step=6144, len=15, n/ep=6, n/st=64, player_2/loss=103.122, rew=25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 500.07it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=169.252, player_2/loss=123.095, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 495.11it/s, env_step=8192, len=12, n/ep=4, n/st=64, player_1/loss=211.012, player_2/loss=102.285, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 501.64it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=236.273, player_2/loss=62.701, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 498.36it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=249.126, player_2/loss=71.910, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 497.74it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=250.134, player_2/loss=68.178, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 500.89it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=253.155, player_2/loss=71.992, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 499.26it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=243.394, player_2/loss=30.924, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 492.63it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=210.396, player_2/loss=34.172, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 496.51it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=305.120, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 493.40it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=328.474, player_2/loss=43.412, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 501.26it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=275.090, player_2/loss=45.411, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 492.56it/s, env_step=18432, len=12, n/ep=4, n/st=64, player_1/loss=286.589, player_2/loss=38.479, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 487.73it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=227.237, player_2/loss=18.188, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 494.31it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=154.179, player_2/loss=27.847, rew=-15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 499.27it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=113.179, player_2/loss=176.136, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 500.22it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=95.029, player_2/loss=318.391, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 485.54it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=102.963, rew=25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 497.12it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=78.145, player_2/loss=398.727, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 492.01it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=69.851, player_2/loss=373.130, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 489.52it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=86.295, player_2/loss=332.925, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.29it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=89.251, player_2/loss=342.622, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 489.81it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=66.213, player_2/loss=326.426, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 489.54it/s, env_step=10240, len=8, n/ep=9, n/st=64, player_1/loss=69.542, player_2/loss=282.081, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.78it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=94.266, player_2/loss=304.341, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 492.13it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=54.529, player_2/loss=270.130, rew=18.75]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 486.30it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=42.579, player_2/loss=293.539, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 493.65it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=39.668, player_2/loss=279.409, rew=19.44]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 493.89it/s, env_step=15360, len=7, n/ep=10, n/st=64, player_1/loss=41.071, player_2/loss=308.570, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 492.33it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=59.504, player_2/loss=309.840, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.24it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=65.014, player_2/loss=297.367, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.14it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=31.284, player_2/loss=277.765, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.03it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=9.359, player_2/loss=282.479, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.47it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=119.272, player_2/loss=261.609, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 496.78it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=159.318, player_2/loss=203.106, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 485.17it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=243.075, player_2/loss=91.220, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 492.69it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=273.197, player_2/loss=87.585, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 488.27it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=267.762, player_2/loss=119.836, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 493.87it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=252.989, rew=25.00]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 496.15it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=286.428, player_2/loss=56.164, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 494.45it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=242.251, player_2/loss=117.654, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.46it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=107.452, player_2/loss=177.673, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 494.84it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=227.896, player_2/loss=137.966, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.87it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=309.218, player_2/loss=129.965, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 499.35it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=276.638, player_2/loss=124.992, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 490.73it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=191.566, player_2/loss=134.351, rew=-15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 495.06it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=144.912, player_2/loss=138.637, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 493.15it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=191.303, player_2/loss=146.976, rew=-15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 492.65it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=119.461, player_2/loss=131.315, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 498.59it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=145.257, player_2/loss=155.797, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.53it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=226.624, player_2/loss=131.888, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 490.22it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=250.469, player_2/loss=67.855, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 496.56it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=130.403, player_2/loss=102.966, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.68it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=100.571, player_2/loss=126.030, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.48it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=125.056, player_2/loss=199.847, rew=3.57]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.96it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=157.988, player_2/loss=362.916, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 492.68it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=188.277, player_2/loss=356.563, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 500.53it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=189.274, player_2/loss=190.837, rew=15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 496.82it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=122.650, player_2/loss=168.591, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.04it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=89.267, player_2/loss=146.721, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.80it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_2/loss=170.951, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 496.99it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=130.831, player_2/loss=156.430, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 491.16it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=73.148, player_2/loss=161.749, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.74it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=56.372, player_2/loss=213.764, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.26it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=91.646, player_2/loss=206.539, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 492.35it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=105.932, player_2/loss=172.619, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 498.94it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=61.658, player_2/loss=145.111, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 500.52it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=53.501, player_2/loss=157.117, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 494.71it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=58.418, player_2/loss=158.727, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.56it/s, env_step=18432, len=15, n/ep=5, n/st=64, player_1/loss=57.887, player_2/loss=178.252, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.34it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=73.607, player_2/loss=159.059, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 490.31it/s, env_step=1024, len=27, n/ep=3, n/st=64, player_1/loss=96.303, player_2/loss=85.361, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.90it/s, env_step=2048, len=26, n/ep=2, n/st=64, player_1/loss=72.316, player_2/loss=69.455, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.83it/s, env_step=3072, len=25, n/ep=2, n/st=64, player_1/loss=74.639, player_2/loss=65.962, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.78it/s, env_step=4096, len=9, n/ep=5, n/st=64, player_1/loss=148.960, player_2/loss=82.545, rew=-5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.02it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=145.930, player_2/loss=98.444, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.08it/s, env_step=6144, len=30, n/ep=2, n/st=64, player_1/loss=79.345, player_2/loss=111.302, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.02it/s, env_step=7168, len=24, n/ep=2, n/st=64, player_1/loss=41.782, player_2/loss=72.354, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 495.88it/s, env_step=8192, len=33, n/ep=2, n/st=64, player_1/loss=64.168, player_2/loss=83.662, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 490.89it/s, env_step=9216, len=29, n/ep=2, n/st=64, player_1/loss=89.222, player_2/loss=87.983, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 497.43it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=108.405, player_2/loss=67.633, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 499.12it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=150.428, player_2/loss=61.799, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 491.68it/s, env_step=12288, len=29, n/ep=2, n/st=64, player_1/loss=142.967, player_2/loss=79.797, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 496.06it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=160.453, player_2/loss=69.131, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 497.07it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=176.454, player_2/loss=68.062, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 495.55it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=188.161, player_2/loss=49.775, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 491.42it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=191.775, player_2/loss=35.771, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 500.74it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=150.791, player_2/loss=29.833, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 494.85it/s, env_step=18432, len=19, n/ep=4, n/st=64, player_1/loss=127.308, player_2/loss=57.959, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.71it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=173.995, player_2/loss=101.269, rew=5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 485.18it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=154.144, player_2/loss=194.722, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.55it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=143.544, player_2/loss=231.230, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.77it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=80.337, player_2/loss=278.985, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 497.33it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=79.391, player_2/loss=265.951, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 496.25it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=80.949, player_2/loss=371.469, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 495.32it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=63.506, player_2/loss=291.596, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 495.55it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=62.492, player_2/loss=223.316, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 494.35it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=71.420, player_2/loss=270.906, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 490.51it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=81.451, player_2/loss=293.069, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 497.03it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=66.458, player_2/loss=431.258, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 501.61it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=48.798, player_2/loss=448.249, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 494.80it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=46.853, player_2/loss=508.325, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 492.97it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=47.001, player_2/loss=324.278, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 495.72it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=95.032, player_2/loss=305.589, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 499.17it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=85.032, player_2/loss=298.276, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 488.80it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=112.910, player_2/loss=329.661, rew=10.71]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 492.24it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=170.499, rew=13.89]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 496.26it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=150.157, player_2/loss=340.811, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 492.51it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=98.996, player_2/loss=328.751, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 490.59it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=37.835, player_2/loss=276.873, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.53it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=53.772, player_2/loss=186.761, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.91it/s, env_step=3072, len=21, n/ep=2, n/st=64, player_1/loss=63.081, player_2/loss=121.003, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.25it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=77.802, player_2/loss=101.577, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 495.08it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=106.094, player_2/loss=127.495, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 496.80it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=107.607, player_2/loss=109.960, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 493.21it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=90.252, player_2/loss=116.071, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 497.04it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=102.839, player_2/loss=117.166, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 495.51it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=106.849, player_2/loss=108.235, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 487.47it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=91.998, player_2/loss=78.281, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 495.79it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=105.268, player_2/loss=75.352, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 497.89it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_2/loss=61.506, rew=-8.33]        


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 491.67it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=81.233, player_2/loss=51.430, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 495.65it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=93.557, player_2/loss=71.597, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 493.80it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=135.335, player_2/loss=112.169, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 488.22it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=160.496, player_2/loss=109.212, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 498.94it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=196.373, player_2/loss=166.515, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 491.20it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=226.780, player_2/loss=102.686, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 496.60it/s, env_step=19456, len=16, n/ep=3, n/st=64, player_1/loss=197.648, player_2/loss=34.298, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 498.73it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=233.386, player_2/loss=79.535, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.32it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=216.549, player_2/loss=117.595, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 493.95it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=169.713, player_2/loss=117.259, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 496.13it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=142.867, player_2/loss=153.244, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 487.49it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=140.237, player_2/loss=179.684, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 493.85it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=97.979, player_2/loss=172.855, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 487.75it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=58.342, player_2/loss=150.229, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 495.50it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=52.688, player_2/loss=179.456, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 493.68it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=82.071, player_2/loss=191.684, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 493.85it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=86.331, player_2/loss=198.157, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 485.84it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=81.963, player_2/loss=182.193, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 495.15it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=77.277, player_2/loss=201.695, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.10it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=117.960, player_2/loss=211.618, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 488.27it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=107.283, player_2/loss=187.125, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.87it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=85.246, player_2/loss=181.442, rew=16.67]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 494.01it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=63.241, player_2/loss=167.922, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 489.42it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=60.397, player_2/loss=162.684, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 495.46it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=54.039, player_2/loss=135.806, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 492.11it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=55.072, player_2/loss=145.168, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 491.42it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=126.068, player_2/loss=176.431, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.02it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=110.090, player_2/loss=152.075, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.62it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=122.044, player_2/loss=129.567, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.24it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=138.889, player_2/loss=118.175, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.06it/s, env_step=5120, len=32, n/ep=2, n/st=64, player_1/loss=127.688, player_2/loss=87.380, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 498.94it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=121.491, player_2/loss=76.410, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 497.96it/s, env_step=7168, len=9, n/ep=5, n/st=64, player_1/loss=140.177, player_2/loss=101.986, rew=-15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 496.12it/s, env_step=8192, len=33, n/ep=2, n/st=64, player_1/loss=148.801, player_2/loss=80.758, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 495.95it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=126.450, player_2/loss=90.009, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 494.63it/s, env_step=10240, len=27, n/ep=2, n/st=64, player_1/loss=107.552, player_2/loss=84.534, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 492.48it/s, env_step=11264, len=24, n/ep=2, n/st=64, player_1/loss=112.391, player_2/loss=116.698, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 499.27it/s, env_step=12288, len=27, n/ep=2, n/st=64, player_1/loss=126.035, player_2/loss=94.320, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 491.27it/s, env_step=13312, len=26, n/ep=3, n/st=64, player_1/loss=115.061, player_2/loss=92.501, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 494.85it/s, env_step=14336, len=30, n/ep=2, n/st=64, player_1/loss=137.932, player_2/loss=85.204, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 490.21it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=178.079, player_2/loss=95.816, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 494.85it/s, env_step=16384, len=26, n/ep=3, n/st=64, player_1/loss=141.047, player_2/loss=96.831, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 499.00it/s, env_step=17408, len=27, n/ep=2, n/st=64, player_2/loss=80.590, rew=-25.00]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 497.14it/s, env_step=18432, len=28, n/ep=2, n/st=64, player_1/loss=108.816, player_2/loss=84.201, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 493.24it/s, env_step=19456, len=27, n/ep=2, n/st=64, player_1/loss=117.911, player_2/loss=85.761, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 498.04it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=126.981, player_2/loss=40.456, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.56it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=109.229, player_2/loss=78.386, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.05it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=91.605, player_2/loss=91.720, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.01it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=66.645, player_2/loss=48.906, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.00it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=49.128, player_2/loss=67.109, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.79it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=41.234, player_2/loss=78.845, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.96it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=50.308, player_2/loss=115.371, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 491.53it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=14.679, player_2/loss=138.468, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.73it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=57.655, player_2/loss=143.756, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 489.60it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=48.106, player_2/loss=142.052, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 500.93it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=48.042, player_2/loss=152.634, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.00it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=19.397, player_2/loss=146.957, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.44it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=13.880, player_2/loss=117.673, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 497.38it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=5.358, player_2/loss=114.741, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 496.50it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_2/loss=135.810, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 497.64it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=40.542, player_2/loss=158.168, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.31it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=62.707, player_2/loss=149.318, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.54it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=58.214, player_2/loss=144.436, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.06it/s, env_step=19456, len=10, n/ep=4, n/st=64, player_1/loss=48.308, player_2/loss=143.806, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 488.78it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=16.658, player_2/loss=222.990, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.71it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=56.348, player_2/loss=216.245, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.60it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=100.006, player_2/loss=179.056, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 488.12it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=88.104, player_2/loss=142.577, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 473.78it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=46.912, player_2/loss=99.724, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.34it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=28.880, player_2/loss=81.063, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 477.22it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=44.412, player_2/loss=67.098, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 491.86it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=29.286, player_2/loss=41.434, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.81it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=11.372, player_2/loss=31.547, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 489.82it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=10.346, player_2/loss=62.607, rew=-15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 494.99it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=62.223, player_2/loss=94.124, rew=-16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 500.93it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=69.899, player_2/loss=56.166, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 491.09it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=76.447, player_2/loss=109.030, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 495.64it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=97.522, player_2/loss=99.372, rew=-16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 491.51it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=39.948, player_2/loss=69.227, rew=-15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 494.59it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=39.887, player_2/loss=65.716, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 497.41it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=39.512, player_2/loss=55.174, rew=-15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 490.27it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=10.941, player_2/loss=55.807, rew=-16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 497.52it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=87.506, player_2/loss=94.488, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 495.44it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=15.953, player_2/loss=131.370, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.28it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=8.926, player_2/loss=132.301, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.75it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=13.577, player_2/loss=107.429, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 491.84it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=31.587, player_2/loss=110.157, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 487.32it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=27.053, player_2/loss=132.083, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.19it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=13.032, player_2/loss=148.035, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 490.05it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=35.605, player_2/loss=159.121, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.16it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=74.158, player_2/loss=169.600, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.35it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=84.698, player_2/loss=133.967, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 484.58it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=18.285, player_2/loss=125.868, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 493.54it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=5.228, player_2/loss=123.504, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.27it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=2.736, player_2/loss=120.207, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 485.89it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=3.508, player_2/loss=116.972, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 491.43it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=2.866, player_2/loss=124.923, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 496.21it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=3.546, player_2/loss=109.741, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 488.20it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=39.036, player_2/loss=124.609, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 495.90it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=50.456, player_2/loss=155.414, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.54it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=70.498, player_2/loss=159.636, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.75it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=49.194, player_2/loss=172.205, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 494.13it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=10.294, player_2/loss=60.396, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.44it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=38.795, player_2/loss=79.514, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.79it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=67.816, player_2/loss=96.464, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.20it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=68.799, player_2/loss=96.080, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.90it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=41.648, player_2/loss=58.048, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.11it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=35.169, player_2/loss=63.443, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 490.20it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=104.144, player_2/loss=113.972, rew=-13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 494.52it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=78.868, player_2/loss=121.559, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 488.83it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=69.173, player_2/loss=167.966, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 491.72it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=38.217, player_2/loss=133.387, rew=-13.89]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 490.19it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=37.945, player_2/loss=72.700, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 488.47it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=55.913, player_2/loss=62.029, rew=-13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 500.70it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=88.917, player_2/loss=90.912, rew=-13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 497.61it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=88.936, player_2/loss=118.826, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 491.18it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=228.464, player_2/loss=96.983, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 493.43it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=304.074, player_2/loss=121.939, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 498.54it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=255.967, player_2/loss=95.692, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 488.90it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=306.558, player_2/loss=43.265, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 498.39it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=359.476, player_2/loss=105.457, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 484.68it/s, env_step=1024, len=16, n/ep=5, n/st=64, player_1/loss=362.350, player_2/loss=131.044, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.18it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=264.061, player_2/loss=79.054, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 498.08it/s, env_step=3072, len=20, n/ep=4, n/st=64, player_1/loss=184.454, player_2/loss=72.025, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.80it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_1/loss=181.699, player_2/loss=109.386, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.68it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=88.567, player_2/loss=130.966, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.92it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=50.195, player_2/loss=121.695, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.33it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=35.168, player_2/loss=119.446, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.59it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=37.483, player_2/loss=96.998, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 500.59it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=46.790, player_2/loss=60.746, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 486.52it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=29.348, player_2/loss=133.135, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 498.12it/s, env_step=11264, len=12, n/ep=4, n/st=64, player_1/loss=21.487, player_2/loss=152.637, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 497.01it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=29.772, player_2/loss=164.169, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 487.60it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=30.736, player_2/loss=208.374, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.40it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=41.855, player_2/loss=186.172, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 493.93it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=62.041, player_2/loss=231.185, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 493.28it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=72.349, player_2/loss=244.233, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 493.61it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=58.751, player_2/loss=220.970, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.93it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=34.810, player_2/loss=252.108, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 492.21it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=96.389, player_2/loss=270.046, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 496.40it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=172.940, player_2/loss=165.207, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.24it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=238.799, player_2/loss=136.359, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 493.03it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=296.318, player_2/loss=85.871, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.38it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=241.676, player_2/loss=124.777, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.13it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=304.396, player_2/loss=107.840, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 488.02it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=339.663, player_2/loss=108.051, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.54it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=313.600, player_2/loss=101.037, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 496.66it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=310.455, player_2/loss=40.465, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 487.23it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=254.526, player_2/loss=60.272, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.55it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=261.898, player_2/loss=46.098, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.30it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=273.395, player_2/loss=34.510, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 491.32it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=244.895, player_2/loss=60.586, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 489.54it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=294.822, rew=25.00]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 494.40it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=340.674, player_2/loss=12.683, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.64it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=285.143, player_2/loss=41.068, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 496.42it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=258.165, player_2/loss=43.984, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.81it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=270.220, player_2/loss=39.191, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 499.02it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=296.359, player_2/loss=61.360, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 492.29it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=327.871, player_2/loss=54.852, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 493.56it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=33.838, player_2/loss=170.579, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.97it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=37.155, player_2/loss=179.289, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.36it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=26.301, player_2/loss=190.543, rew=16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 488.54it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=23.127, player_2/loss=349.219, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.32it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=43.611, player_2/loss=539.973, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 490.24it/s, env_step=6144, len=10, n/ep=8, n/st=64, player_1/loss=44.161, player_2/loss=546.015, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 490.89it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=28.797, player_2/loss=493.324, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.95it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=20.045, player_2/loss=485.329, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 494.79it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=14.760, player_2/loss=486.458, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 487.84it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=17.863, player_2/loss=543.048, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.27it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=15.371, player_2/loss=563.417, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 490.97it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=3.089, player_2/loss=542.651, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 485.33it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=8.669, player_2/loss=510.747, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 489.43it/s, env_step=14336, len=7, n/ep=7, n/st=64, player_1/loss=11.223, player_2/loss=515.948, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 485.92it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=5.728, player_2/loss=489.773, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 490.91it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=55.465, player_2/loss=442.088, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 493.15it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=21.637, player_2/loss=482.164, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 486.24it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=20.620, player_2/loss=484.520, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.06it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=21.839, rew=25.00]         


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 494.62it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=17.447, player_2/loss=379.162, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.86it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=44.126, player_2/loss=317.890, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.06it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=75.145, player_2/loss=236.620, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 491.80it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=95.597, player_2/loss=219.103, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.28it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=97.696, player_2/loss=207.392, rew=-19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.51it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=57.086, player_2/loss=176.361, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.53it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=87.835, rew=-25.00]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 495.06it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=8.791, player_2/loss=107.630, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 492.88it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=48.224, player_2/loss=111.275, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.55it/s, env_step=10240, len=7, n/ep=10, n/st=64, player_1/loss=58.798, player_2/loss=109.897, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 493.95it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=20.781, player_2/loss=100.323, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 500.40it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=66.082, player_2/loss=91.012, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 493.78it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=127.612, player_2/loss=111.989, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.17it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=202.082, player_2/loss=93.798, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #15: 1025it [00:02, 497.33it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=225.830, player_2/loss=57.037, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #16: 1025it [00:02, 488.72it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=206.291, player_2/loss=43.254, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #17: 1025it [00:02, 497.35it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=148.557, player_2/loss=37.354, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #18: 1025it [00:02, 486.12it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=177.272, player_2/loss=65.066, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #19: 1025it [00:02, 497.15it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=151.790, player_2/loss=63.075, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #1: 1025it [00:02, 492.90it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=164.721, player_2/loss=36.060, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.93it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=98.056, player_2/loss=37.123, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.01it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=99.605, player_2/loss=158.861, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.85it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=126.644, player_2/loss=177.205, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 484.36it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=117.889, player_2/loss=166.584, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 497.10it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=172.770, player_2/loss=157.105, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 489.62it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=175.811, player_2/loss=206.561, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 494.34it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=149.081, player_2/loss=159.096, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 491.98it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=181.228, player_2/loss=213.228, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 492.92it/s, env_step=10240, len=8, n/ep=9, n/st=64, player_1/loss=111.638, player_2/loss=193.785, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 487.86it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=27.424, player_2/loss=178.802, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 490.00it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=59.882, player_2/loss=207.281, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 496.14it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=89.071, player_2/loss=246.758, rew=18.75]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 487.66it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=91.119, player_2/loss=204.307, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 494.43it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=69.413, player_2/loss=192.844, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 491.44it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=61.492, player_2/loss=185.304, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 488.64it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=100.543, player_2/loss=222.833, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 497.90it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=138.745, player_2/loss=221.955, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 494.19it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=85.037, player_2/loss=191.963, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 488.04it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=13.779, player_2/loss=132.030, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.63it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=175.055, player_2/loss=107.803, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 498.30it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=289.186, player_2/loss=54.984, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 489.09it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=356.722, player_2/loss=40.104, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 499.94it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=401.898, player_2/loss=35.739, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 494.48it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=304.288, player_2/loss=48.972, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.48it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=282.781, player_2/loss=59.017, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 495.11it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=283.785, player_2/loss=49.170, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 498.25it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=327.178, player_2/loss=15.057, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.15it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=329.731, player_2/loss=11.625, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 495.28it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=286.419, player_2/loss=13.719, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 494.60it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=255.241, player_2/loss=11.687, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 488.19it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=271.952, player_2/loss=35.003, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 496.78it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=259.432, player_2/loss=41.732, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 495.38it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=261.756, player_2/loss=13.084, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 489.54it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=288.674, player_2/loss=6.454, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 494.34it/s, env_step=17408, len=13, n/ep=6, n/st=64, player_1/loss=309.767, player_2/loss=7.972, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 494.28it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=302.454, player_2/loss=32.190, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 494.72it/s, env_step=19456, len=20, n/ep=4, n/st=64, player_1/loss=285.113, player_2/loss=40.440, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 493.84it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=221.995, player_2/loss=55.733, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.19it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=215.781, player_2/loss=53.007, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 490.63it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=149.631, player_2/loss=94.690, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 493.26it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=98.580, player_2/loss=155.444, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 493.01it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=82.283, player_2/loss=207.399, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 487.86it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=48.484, player_2/loss=191.616, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.83it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=35.604, player_2/loss=272.787, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 494.33it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=47.079, player_2/loss=277.954, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 486.72it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=43.002, player_2/loss=255.650, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 487.72it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=39.550, player_2/loss=332.642, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 489.96it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=27.265, player_2/loss=411.804, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 478.80it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=24.938, player_2/loss=380.735, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 490.43it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=25.961, player_2/loss=386.204, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 492.74it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=19.950, player_2/loss=399.588, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 486.33it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=9.632, player_2/loss=411.702, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 491.49it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=15.385, player_2/loss=372.927, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 496.54it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=16.677, player_2/loss=344.594, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 483.79it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=19.897, player_2/loss=362.794, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 489.25it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=23.269, player_2/loss=406.541, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 488.33it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=54.904, player_2/loss=107.743, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.81it/s, env_step=2048, len=25, n/ep=3, n/st=64, player_1/loss=61.649, player_2/loss=109.978, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 499.72it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=102.243, player_2/loss=104.516, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.21it/s, env_step=4096, len=9, n/ep=8, n/st=64, player_1/loss=153.589, player_2/loss=120.720, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 495.99it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=140.195, player_2/loss=154.140, rew=-5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 500.24it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=120.903, player_2/loss=167.322, rew=-12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.60it/s, env_step=7168, len=21, n/ep=4, n/st=64, player_1/loss=133.734, player_2/loss=117.834, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 495.30it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=161.516, player_2/loss=80.366, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 498.98it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=166.376, player_2/loss=85.033, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 502.57it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=170.771, player_2/loss=90.864, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 491.54it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=179.386, player_2/loss=59.259, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 501.49it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=225.808, player_2/loss=107.050, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 493.27it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=213.446, player_2/loss=128.365, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 500.32it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=147.638, player_2/loss=81.730, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 491.25it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=182.459, player_2/loss=23.508, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 500.99it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=202.875, player_2/loss=48.528, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.25it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=172.359, player_2/loss=77.945, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 492.12it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=139.247, player_2/loss=71.350, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 500.42it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=157.230, player_2/loss=73.183, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 491.30it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=134.961, player_2/loss=154.581, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 500.28it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=141.179, player_2/loss=154.049, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 496.07it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=195.279, player_2/loss=130.198, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 491.64it/s, env_step=4096, len=21, n/ep=4, n/st=64, player_1/loss=151.267, player_2/loss=110.274, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 502.83it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=138.860, player_2/loss=235.988, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 501.05it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=106.585, player_2/loss=243.489, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 501.14it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=70.085, player_2/loss=182.694, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.41it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=75.310, player_2/loss=246.153, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 501.44it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=27.631, player_2/loss=258.107, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 492.00it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=15.822, player_2/loss=183.528, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 500.84it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=10.429, player_2/loss=196.944, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 502.09it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=79.684, player_2/loss=233.808, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 490.79it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=124.492, player_2/loss=244.756, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 501.14it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=59.414, player_2/loss=394.399, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 500.96it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=36.933, player_2/loss=561.516, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 493.97it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=68.769, player_2/loss=615.817, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.93it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=53.647, player_2/loss=641.345, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.31it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=69.347, player_2/loss=594.746, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 497.20it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=110.913, player_2/loss=561.341, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 497.19it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=58.720, player_2/loss=441.283, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.10it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=44.360, player_2/loss=348.229, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.61it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=135.081, player_2/loss=259.091, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.86it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=286.853, player_2/loss=231.855, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 499.80it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=299.886, player_2/loss=219.328, rew=-5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 492.27it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=291.081, player_2/loss=167.925, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 497.91it/s, env_step=7168, len=12, n/ep=4, n/st=64, player_1/loss=295.784, player_2/loss=54.945, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 500.96it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=311.200, rew=25.00]         


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 490.35it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=258.058, player_2/loss=64.987, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 500.78it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=223.728, player_2/loss=51.571, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 498.46it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=280.138, player_2/loss=19.541, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 491.26it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=274.723, player_2/loss=13.455, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 498.04it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_2/loss=18.962, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 502.15it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=224.365, player_2/loss=18.466, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 492.92it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=324.785, player_2/loss=7.227, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 496.86it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=375.351, player_2/loss=20.542, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 500.93it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=360.551, player_2/loss=22.231, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 491.51it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=360.382, player_2/loss=30.054, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 501.50it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=290.460, player_2/loss=25.736, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 496.49it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=265.441, player_2/loss=28.246, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.12it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=259.396, player_2/loss=17.703, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.33it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=201.106, player_2/loss=38.961, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 499.38it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=155.711, player_2/loss=202.327, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 486.38it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=169.232, player_2/loss=334.717, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 494.18it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=146.545, player_2/loss=358.829, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 491.51it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=101.285, player_2/loss=370.405, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 486.99it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=118.079, player_2/loss=252.767, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 492.80it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=111.758, player_2/loss=283.635, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 491.02it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=53.053, player_2/loss=358.403, rew=10.71]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 483.80it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=35.208, player_2/loss=462.507, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 496.99it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=25.969, player_2/loss=432.822, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.05it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=51.536, player_2/loss=303.489, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 491.57it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=55.777, player_2/loss=414.242, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 494.02it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=33.723, player_2/loss=517.250, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 487.64it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=46.919, player_2/loss=504.030, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 496.20it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=70.301, player_2/loss=410.658, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 496.14it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=36.155, player_2/loss=393.588, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 484.98it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=40.704, player_2/loss=405.205, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 497.03it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=11.116, player_2/loss=313.631, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.27it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=22.838, player_2/loss=257.670, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.96it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=51.855, player_2/loss=213.843, rew=-17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 501.20it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=165.531, player_2/loss=288.940, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 491.03it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=388.653, player_2/loss=276.298, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 497.44it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=405.687, player_2/loss=227.950, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 498.30it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=467.439, player_2/loss=172.262, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 491.27it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=555.831, player_2/loss=131.149, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 497.80it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=582.698, player_2/loss=80.089, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 497.94it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=631.986, player_2/loss=73.555, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 488.27it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=617.972, player_2/loss=107.167, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 498.96it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=570.786, player_2/loss=104.406, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 497.29it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=668.779, rew=18.75]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 495.24it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=656.704, player_2/loss=95.674, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 500.65it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=546.876, player_2/loss=103.697, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 501.36it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=568.599, player_2/loss=100.411, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 491.18it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=534.251, player_2/loss=49.162, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 499.59it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=579.404, player_2/loss=15.531, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 498.46it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=768.321, player_2/loss=30.153, rew=18.75]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 490.66it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=270.309, player_2/loss=236.304, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.16it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=114.101, player_2/loss=297.653, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 502.45it/s, env_step=3072, len=13, n/ep=6, n/st=64, player_1/loss=55.873, player_2/loss=528.796, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.92it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=15.554, player_2/loss=596.094, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.12it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=24.637, player_2/loss=593.498, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 499.58it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=34.392, player_2/loss=506.227, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.74it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=27.430, player_2/loss=504.763, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 498.90it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=33.826, player_2/loss=473.316, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.47it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=72.235, player_2/loss=490.609, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 496.42it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=97.176, player_2/loss=595.331, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 501.85it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=72.999, player_2/loss=642.258, rew=5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 498.82it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=47.774, player_2/loss=644.639, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 494.89it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=44.764, player_2/loss=474.821, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 500.29it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=62.656, player_2/loss=468.570, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 500.91it/s, env_step=15360, len=12, n/ep=3, n/st=64, player_1/loss=48.962, player_2/loss=645.592, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 494.04it/s, env_step=16384, len=11, n/ep=3, n/st=64, player_1/loss=12.579, player_2/loss=866.011, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 499.34it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=10.584, player_2/loss=691.529, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 498.95it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=9.046, player_2/loss=451.769, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 498.07it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=17.664, player_2/loss=370.004, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 501.45it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=400.607, player_2/loss=128.046, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 498.34it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=412.354, player_2/loss=89.452, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 492.34it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=406.577, player_2/loss=66.178, rew=18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 496.46it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=459.254, player_2/loss=59.418, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 496.58it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=485.041, player_2/loss=58.936, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 491.28it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=437.173, player_2/loss=96.776, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 492.62it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=465.315, player_2/loss=96.893, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.69it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=558.041, player_2/loss=131.794, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 493.06it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=438.954, player_2/loss=89.394, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 498.47it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=447.660, player_2/loss=60.282, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 499.03it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=515.217, player_2/loss=103.254, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 485.03it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_2/loss=96.906, rew=18.75]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 498.34it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=453.533, player_2/loss=51.104, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 489.87it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=462.054, player_2/loss=45.535, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 499.12it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=342.595, player_2/loss=38.616, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 499.69it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=384.444, player_2/loss=49.894, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.85it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=478.631, player_2/loss=46.888, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.03it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=505.177, player_2/loss=20.028, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 496.69it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=492.840, player_2/loss=39.828, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.27it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=147.772, player_2/loss=118.647, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 501.13it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=114.622, player_2/loss=174.978, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.45it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=60.440, player_2/loss=280.704, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.67it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=41.517, player_2/loss=414.850, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.16it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=56.375, player_2/loss=499.986, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 495.73it/s, env_step=6144, len=13, n/ep=6, n/st=64, player_1/loss=61.660, player_2/loss=499.694, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.78it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=19.904, player_2/loss=485.741, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 488.72it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=52.813, player_2/loss=483.877, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 497.40it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=76.358, player_2/loss=536.983, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 496.79it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=34.853, player_2/loss=515.811, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 469.49it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=38.571, player_2/loss=556.047, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.28it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=60.301, player_2/loss=571.533, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.61it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=119.530, player_2/loss=423.214, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 494.04it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=148.226, player_2/loss=251.469, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 496.11it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=64.306, player_2/loss=191.425, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 493.58it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=68.231, player_2/loss=345.622, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 493.02it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=71.354, player_2/loss=388.448, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 486.01it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=38.028, player_2/loss=308.903, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 493.99it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=25.657, player_2/loss=261.186, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 493.30it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=39.633, player_2/loss=213.826, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 493.08it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=122.650, player_2/loss=131.187, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 497.85it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=173.419, player_2/loss=65.252, rew=12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 496.44it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=142.894, player_2/loss=75.270, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 487.56it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=194.069, player_2/loss=84.589, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 496.28it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=217.406, rew=-5.00]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 497.41it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=223.717, player_2/loss=139.723, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 488.39it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=198.771, player_2/loss=128.773, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 493.44it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=145.069, player_2/loss=86.099, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 492.25it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=165.583, player_2/loss=97.071, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.64it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=190.959, player_2/loss=62.497, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.15it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=174.517, player_2/loss=53.725, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.72it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=202.257, player_2/loss=48.213, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 490.37it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=221.613, player_2/loss=80.077, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 494.00it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=170.603, player_2/loss=93.471, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.86it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_2/loss=89.322, rew=-12.50]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.56it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=173.272, player_2/loss=72.557, rew=-5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 500.46it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_1/loss=141.404, player_2/loss=54.736, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 494.71it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=173.031, player_2/loss=60.496, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 485.30it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=80.970, player_2/loss=403.208, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.40it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=44.902, player_2/loss=453.834, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.35it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=46.544, player_2/loss=333.852, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.95it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=56.783, player_2/loss=276.560, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.12it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=25.616, player_2/loss=313.348, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 487.25it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=75.429, player_2/loss=354.667, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 487.97it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=75.403, player_2/loss=303.178, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.90it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=93.159, player_2/loss=259.751, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 492.73it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=37.438, player_2/loss=268.535, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 492.21it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=37.457, player_2/loss=316.729, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.66it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=32.079, player_2/loss=284.136, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.90it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=14.187, player_2/loss=302.714, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 498.11it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=10.677, player_2/loss=270.731, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 486.52it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=10.421, player_2/loss=294.173, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 490.06it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=4.397, player_2/loss=343.433, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 489.02it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=13.481, player_2/loss=296.303, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 492.02it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=18.312, player_2/loss=244.685, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 490.81it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=70.718, player_2/loss=224.059, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.60it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=76.774, player_2/loss=226.296, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 493.20it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=97.434, player_2/loss=177.285, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.00it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=103.553, player_2/loss=100.620, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.03it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=79.968, player_2/loss=31.738, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.15it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=77.832, player_2/loss=46.420, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 494.96it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=58.961, player_2/loss=34.680, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 489.62it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=44.214, player_2/loss=20.058, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.91it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=97.705, player_2/loss=73.427, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.60it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=90.820, player_2/loss=109.550, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 490.74it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=47.931, player_2/loss=73.133, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 498.19it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=52.397, player_2/loss=45.707, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 494.49it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=39.114, player_2/loss=26.163, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 485.02it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=62.383, player_2/loss=59.221, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 493.89it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=88.361, player_2/loss=48.117, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 495.04it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=52.161, player_2/loss=14.704, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 487.61it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=58.451, player_2/loss=23.248, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 492.69it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=58.664, player_2/loss=20.825, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 492.31it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=38.632, player_2/loss=53.029, rew=5.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 488.45it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=126.771, player_2/loss=87.426, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.11it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=126.154, player_2/loss=39.981, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 494.18it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=17.125, player_2/loss=14.175, rew=5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.33it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=158.751, player_2/loss=156.005, rew=-16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.20it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=239.086, player_2/loss=271.818, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.17it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=246.024, player_2/loss=341.567, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 494.85it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=233.582, player_2/loss=411.647, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 495.57it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=189.275, player_2/loss=449.223, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.99it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=179.627, player_2/loss=437.169, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 493.94it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=199.627, player_2/loss=393.501, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.58it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=180.005, player_2/loss=344.682, rew=-18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 493.70it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=112.353, player_2/loss=334.510, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 496.52it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=100.321, player_2/loss=323.441, rew=-17.86]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 485.12it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=90.239, rew=-25.00]        


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 495.64it/s, env_step=13312, len=7, n/ep=10, n/st=64, player_1/loss=146.107, player_2/loss=348.638, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 491.36it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=258.719, player_2/loss=302.289, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 488.14it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=267.098, player_2/loss=227.041, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 488.91it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=138.199, player_2/loss=192.657, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 495.83it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=132.880, player_2/loss=211.339, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 489.90it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=100.183, player_2/loss=238.192, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.97it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=107.839, player_2/loss=215.170, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.68it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=85.403, player_2/loss=274.987, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.96it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=72.827, rew=-19.44]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.85it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=68.385, player_2/loss=197.866, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.89it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=124.714, player_2/loss=137.931, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 487.86it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=139.648, player_2/loss=115.885, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.54it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=108.824, player_2/loss=131.800, rew=-13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 493.09it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=90.784, player_2/loss=140.975, rew=-13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.39it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=122.988, player_2/loss=133.309, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.86it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=173.626, player_2/loss=152.960, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 491.82it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=166.145, player_2/loss=141.004, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 499.65it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=167.746, player_2/loss=136.634, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 494.84it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=226.151, player_2/loss=136.082, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 496.56it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=289.167, player_2/loss=120.238, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 501.76it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=315.770, player_2/loss=89.515, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 489.76it/s, env_step=15360, len=12, n/ep=4, n/st=64, player_1/loss=299.490, player_2/loss=104.920, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 499.02it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=238.848, player_2/loss=110.190, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 476.46it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=207.789, player_2/loss=76.154, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 487.33it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=225.964, player_2/loss=38.146, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 498.94it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=245.026, player_2/loss=60.122, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 498.62it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=217.673, player_2/loss=183.565, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.87it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=167.104, player_2/loss=260.535, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 501.05it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=111.913, player_2/loss=283.460, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 499.62it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=73.942, player_2/loss=251.638, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.17it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=91.527, player_2/loss=250.840, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 499.63it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=87.724, player_2/loss=286.878, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 497.07it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=38.765, rew=18.75]           


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 490.65it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=64.583, player_2/loss=287.958, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.18it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=74.473, player_2/loss=236.557, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 495.35it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=52.994, player_2/loss=262.529, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.50it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=30.707, player_2/loss=313.493, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 495.80it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=21.918, player_2/loss=328.245, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.49it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=39.879, player_2/loss=348.190, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 495.67it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=50.106, player_2/loss=368.644, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 487.85it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=28.897, player_2/loss=299.251, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 494.13it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=4.895, player_2/loss=299.694, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 496.69it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=38.021, player_2/loss=279.667, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 489.56it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=39.448, player_2/loss=284.192, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 493.89it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=16.113, player_2/loss=260.693, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 500.93it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=30.038, player_2/loss=197.843, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.70it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=24.050, player_2/loss=166.163, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 498.74it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=10.186, player_2/loss=144.544, rew=-17.86]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 502.37it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=18.886, player_2/loss=116.103, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 496.54it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=39.788, player_2/loss=84.657, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 502.48it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=107.907, player_2/loss=141.021, rew=-17.86]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 498.66it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=113.126, player_2/loss=226.157, rew=-17.86]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 498.54it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=43.857, player_2/loss=172.094, rew=-17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 495.20it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=101.819, player_2/loss=118.764, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 499.28it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=196.701, player_2/loss=77.175, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 496.64it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=259.128, player_2/loss=38.547, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 494.69it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_2/loss=53.824, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 497.32it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=237.480, player_2/loss=43.667, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 500.70it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=227.094, player_2/loss=20.914, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 489.02it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=218.665, player_2/loss=30.464, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 494.64it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=226.052, player_2/loss=48.026, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 494.10it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=254.905, player_2/loss=33.714, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 491.93it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=261.427, player_2/loss=42.040, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 497.33it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=273.770, player_2/loss=49.271, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 476.49it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=222.627, player_2/loss=197.372, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.04it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=202.921, player_2/loss=178.702, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.65it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=190.710, player_2/loss=160.953, rew=-18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 489.32it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=208.127, player_2/loss=371.158, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 486.45it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=203.200, player_2/loss=521.301, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 483.87it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=122.002, player_2/loss=643.307, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 487.71it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=110.699, player_2/loss=596.634, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 487.59it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=108.735, player_2/loss=584.531, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 481.84it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=92.267, player_2/loss=565.021, rew=18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 485.19it/s, env_step=10240, len=7, n/ep=10, n/st=64, player_1/loss=84.800, player_2/loss=655.643, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 486.78it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=63.454, player_2/loss=597.724, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 485.89it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=55.737, player_2/loss=625.859, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 489.72it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=114.644, player_2/loss=649.823, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 487.35it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=95.152, player_2/loss=669.662, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 479.29it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=88.203, player_2/loss=650.251, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 483.69it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=117.930, player_2/loss=718.766, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 487.66it/s, env_step=17408, len=8, n/ep=9, n/st=64, player_1/loss=87.306, player_2/loss=746.978, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 477.69it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=35.895, player_2/loss=792.582, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 488.32it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=54.094, player_2/loss=650.130, rew=6.25]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 490.03it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=85.936, player_2/loss=309.527, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 486.93it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=83.614, player_2/loss=264.775, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.04it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=89.794, player_2/loss=195.201, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 495.69it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_2/loss=110.521, rew=-25.00]        


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.62it/s, env_step=5120, len=15, n/ep=3, n/st=64, player_1/loss=71.855, player_2/loss=57.301, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.21it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=97.724, player_2/loss=115.865, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.64it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=91.176, player_2/loss=133.499, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.61it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=120.166, player_2/loss=99.615, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 494.23it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=81.600, player_2/loss=64.389, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 495.23it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=76.684, player_2/loss=62.711, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.95it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=77.874, player_2/loss=81.422, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 492.91it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=101.320, player_2/loss=51.073, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 494.82it/s, env_step=13312, len=23, n/ep=2, n/st=64, player_1/loss=109.490, player_2/loss=113.804, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.99it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=186.711, player_2/loss=150.315, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 495.73it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=156.788, player_2/loss=107.357, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 492.25it/s, env_step=16384, len=22, n/ep=3, n/st=64, player_1/loss=139.044, player_2/loss=163.503, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 493.46it/s, env_step=17408, len=25, n/ep=2, n/st=64, player_1/loss=158.463, player_2/loss=142.957, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.00it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=116.336, player_2/loss=76.943, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.76it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=142.002, player_2/loss=116.033, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 479.95it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=93.440, player_2/loss=263.032, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 476.37it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=101.363, rew=19.44]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 487.04it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=91.100, player_2/loss=341.494, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 487.54it/s, env_step=4096, len=7, n/ep=7, n/st=64, player_1/loss=146.363, player_2/loss=368.563, rew=3.57]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 480.09it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=148.149, player_2/loss=280.095, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 486.70it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=111.286, player_2/loss=291.349, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 488.01it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=45.703, player_2/loss=339.055, rew=6.25]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 479.13it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=47.899, player_2/loss=355.501, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 489.65it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=73.933, player_2/loss=364.461, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 487.36it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=115.467, player_2/loss=369.287, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 482.12it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_2/loss=303.686, rew=18.75]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 489.67it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=90.704, player_2/loss=287.856, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 486.69it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=75.625, player_2/loss=256.512, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.14it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=54.890, player_2/loss=324.876, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 481.91it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=40.813, player_2/loss=325.214, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 489.51it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=34.137, player_2/loss=293.050, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 487.57it/s, env_step=17408, len=9, n/ep=8, n/st=64, player_1/loss=60.973, player_2/loss=335.554, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 484.17it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=79.091, player_2/loss=322.448, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 484.77it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=43.506, player_2/loss=294.225, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 493.07it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=110.844, player_2/loss=139.033, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 487.82it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=145.979, player_2/loss=95.106, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.93it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=160.443, player_2/loss=80.993, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.49it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=105.642, player_2/loss=41.661, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.03it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=91.281, player_2/loss=35.633, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.17it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=109.502, player_2/loss=42.266, rew=-19.44]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.25it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=95.752, player_2/loss=87.796, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 486.54it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=75.087, player_2/loss=65.174, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.62it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=118.852, player_2/loss=84.769, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 493.58it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=143.744, player_2/loss=92.812, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.69it/s, env_step=11264, len=23, n/ep=2, n/st=64, player_1/loss=147.414, player_2/loss=48.726, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 498.69it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=103.482, player_2/loss=19.984, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 488.96it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=50.542, player_2/loss=21.101, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 488.01it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=49.832, player_2/loss=23.455, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.10it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=63.214, player_2/loss=17.271, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 485.49it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=89.759, player_2/loss=49.333, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 495.11it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=103.097, player_2/loss=37.963, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 497.03it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=115.246, player_2/loss=21.265, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 490.70it/s, env_step=19456, len=23, n/ep=3, n/st=64, player_1/loss=115.235, player_2/loss=5.261, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 489.19it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=86.349, player_2/loss=61.551, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.65it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=78.014, rew=15.00]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.52it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=90.457, player_2/loss=120.280, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.47it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=106.411, player_2/loss=112.182, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 490.07it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=112.028, player_2/loss=98.451, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.28it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=99.348, player_2/loss=87.479, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 485.63it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=122.138, player_2/loss=96.236, rew=15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 491.73it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=107.947, player_2/loss=78.376, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 492.64it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=60.745, player_2/loss=95.809, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 478.49it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=31.024, player_2/loss=120.931, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 490.03it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=45.459, player_2/loss=148.448, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 489.51it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=54.152, player_2/loss=144.933, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 488.34it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=89.040, player_2/loss=140.086, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 491.81it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=151.556, player_2/loss=132.863, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 490.38it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=90.863, player_2/loss=114.017, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 493.96it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=28.742, player_2/loss=121.064, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 490.85it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=37.919, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 487.88it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=28.639, player_2/loss=164.244, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 491.66it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=66.338, player_2/loss=92.971, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 491.36it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=40.326, player_2/loss=155.344, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.02it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=44.606, player_2/loss=133.443, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.78it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=75.171, player_2/loss=141.357, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 490.16it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=129.040, player_2/loss=155.750, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 480.83it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=152.145, player_2/loss=148.218, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 491.48it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=175.471, player_2/loss=149.811, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 488.66it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=241.885, player_2/loss=148.487, rew=15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 482.36it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=249.266, player_2/loss=180.005, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 489.67it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=233.572, player_2/loss=200.629, rew=-19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 489.65it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=268.493, player_2/loss=201.076, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 489.65it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=262.836, player_2/loss=202.728, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 495.47it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=238.485, player_2/loss=131.481, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 491.18it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=197.421, player_2/loss=129.105, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 484.03it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=263.531, player_2/loss=117.948, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 494.18it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=300.372, player_2/loss=133.806, rew=15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 488.50it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=255.454, player_2/loss=128.288, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 483.09it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=287.339, player_2/loss=162.814, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 491.92it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=372.866, player_2/loss=127.145, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 488.89it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=375.991, player_2/loss=88.013, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 483.73it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=99.005, player_2/loss=413.595, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.13it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=98.737, player_2/loss=310.086, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.27it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=82.231, player_2/loss=281.646, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.88it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=70.077, player_2/loss=279.097, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 492.04it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=69.805, player_2/loss=216.809, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.70it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=40.899, player_2/loss=202.235, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 486.10it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=39.509, player_2/loss=219.460, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.29it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=37.289, player_2/loss=273.739, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.53it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=68.362, player_2/loss=317.501, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 485.08it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=82.992, player_2/loss=359.780, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.79it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=33.095, player_2/loss=399.581, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 491.77it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=56.518, player_2/loss=357.230, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 484.26it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=67.861, player_2/loss=249.424, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 491.44it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=76.268, player_2/loss=247.396, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.37it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=66.321, player_2/loss=346.425, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 488.11it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=33.353, player_2/loss=354.672, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.80it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=12.725, player_2/loss=327.655, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 490.77it/s, env_step=18432, len=13, n/ep=6, n/st=64, player_1/loss=23.654, player_2/loss=332.964, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 482.11it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=38.531, player_2/loss=360.795, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 492.62it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=78.059, player_2/loss=270.631, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.79it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=88.057, player_2/loss=226.631, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.01it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=65.567, player_2/loss=215.752, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 495.82it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=107.858, player_2/loss=138.054, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 494.68it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=304.231, player_2/loss=121.409, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 485.02it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=262.750, player_2/loss=139.550, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 493.45it/s, env_step=7168, len=38, n/ep=1, n/st=64, player_1/loss=94.654, player_2/loss=163.033, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 493.34it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=123.205, player_2/loss=131.657, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 489.16it/s, env_step=9216, len=22, n/ep=2, n/st=64, player_1/loss=253.011, player_2/loss=107.891, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 488.45it/s, env_step=10240, len=17, n/ep=2, n/st=64, player_1/loss=222.641, player_2/loss=111.621, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 490.02it/s, env_step=11264, len=26, n/ep=3, n/st=64, player_1/loss=78.063, player_2/loss=328.039, rew=16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 490.26it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=101.843, player_2/loss=353.108, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 496.09it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=149.890, player_2/loss=76.359, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 496.60it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=172.712, player_2/loss=84.126, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 487.43it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=186.810, player_2/loss=83.373, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 495.62it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=213.109, player_2/loss=102.083, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 493.27it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=258.020, player_2/loss=98.416, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 492.74it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=237.942, player_2/loss=110.806, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 484.61it/s, env_step=19456, len=7, n/ep=10, n/st=64, player_1/loss=206.633, player_2/loss=98.056, rew=-20.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 492.73it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=120.740, player_2/loss=197.433, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.61it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=105.403, player_2/loss=252.228, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 490.58it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=117.054, player_2/loss=219.658, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 495.87it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=128.649, player_2/loss=152.273, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 493.56it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=115.081, player_2/loss=66.841, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 486.90it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=88.292, player_2/loss=56.281, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 495.08it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=119.375, player_2/loss=40.637, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 489.80it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=153.581, player_2/loss=77.686, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 487.11it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=131.431, player_2/loss=90.433, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.33it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=84.545, player_2/loss=223.874, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 485.43it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=104.902, player_2/loss=369.179, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 488.49it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=140.926, player_2/loss=323.760, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 492.95it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=82.383, player_2/loss=246.375, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 489.64it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=71.311, player_2/loss=197.226, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 492.65it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=66.157, player_2/loss=219.674, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 490.60it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=59.170, player_2/loss=243.721, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 492.89it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=34.518, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 486.47it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=35.940, player_2/loss=216.026, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 491.83it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=35.974, player_2/loss=233.250, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 491.14it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=161.743, player_2/loss=129.897, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.75it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=222.088, player_2/loss=77.418, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 485.68it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=228.501, player_2/loss=60.825, rew=-17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 489.25it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=280.686, player_2/loss=115.290, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 494.70it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=392.702, player_2/loss=141.653, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 480.84it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=295.470, player_2/loss=159.488, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.25it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=222.253, player_2/loss=119.858, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 495.14it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=263.230, player_2/loss=78.995, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 484.28it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=274.775, player_2/loss=69.368, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 492.91it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=270.496, player_2/loss=54.168, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 497.72it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=276.955, player_2/loss=30.532, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 486.71it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=217.884, player_2/loss=69.202, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 494.39it/s, env_step=13312, len=23, n/ep=2, n/st=64, player_1/loss=94.205, player_2/loss=100.574, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.20it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=86.940, player_2/loss=124.150, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 479.96it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=44.071, player_2/loss=86.102, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 495.13it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=59.102, player_2/loss=72.108, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 492.02it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=242.232, player_2/loss=117.489, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 484.00it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=365.831, player_2/loss=139.824, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 490.64it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=289.349, player_2/loss=117.005, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 490.61it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=181.435, player_2/loss=154.063, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 480.12it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=215.513, player_2/loss=133.843, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.08it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=228.844, player_2/loss=118.244, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.55it/s, env_step=4096, len=10, n/ep=7, n/st=64, player_1/loss=207.969, player_2/loss=209.970, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 484.82it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=213.055, player_2/loss=283.429, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 487.40it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=176.023, player_2/loss=258.377, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 487.57it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=136.062, player_2/loss=393.365, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 489.74it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=146.334, player_2/loss=632.949, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 481.95it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=121.087, player_2/loss=711.426, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 487.74it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=117.995, player_2/loss=706.536, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 492.71it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=90.410, player_2/loss=706.283, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 481.23it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=97.482, player_2/loss=783.784, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 490.08it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=67.173, player_2/loss=678.858, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 489.30it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=34.367, player_2/loss=616.021, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 484.97it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=46.804, player_2/loss=652.804, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 482.54it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=50.576, player_2/loss=691.724, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 490.36it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=72.754, player_2/loss=762.749, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 488.58it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=88.039, player_2/loss=649.241, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 482.65it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=68.457, player_2/loss=592.867, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 489.38it/s, env_step=1024, len=13, n/ep=6, n/st=64, player_1/loss=138.764, player_2/loss=461.033, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.21it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=270.099, player_2/loss=348.403, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 484.59it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=308.173, player_2/loss=212.051, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.26it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=255.141, player_2/loss=189.532, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 495.33it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=217.368, player_2/loss=129.384, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 489.54it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=189.147, player_2/loss=125.884, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 491.62it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=229.790, player_2/loss=116.584, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 490.16it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=278.243, player_2/loss=83.141, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 486.26it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=357.251, player_2/loss=35.169, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.68it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=445.054, player_2/loss=29.563, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.87it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=378.705, player_2/loss=40.315, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 489.40it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=383.467, player_2/loss=70.856, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 495.48it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_2/loss=72.039, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 492.72it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=389.222, player_2/loss=54.655, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 490.85it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=405.326, player_2/loss=44.846, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 483.78it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=445.054, player_2/loss=49.757, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 492.49it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=437.598, player_2/loss=61.881, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 492.31it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=431.215, player_2/loss=49.782, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 487.31it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=370.572, player_2/loss=26.314, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 491.80it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=205.261, player_2/loss=101.493, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.03it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=154.272, player_2/loss=146.470, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 486.89it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=92.582, player_2/loss=165.930, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 493.71it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=103.898, player_2/loss=147.918, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 490.98it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=103.557, player_2/loss=139.744, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 479.51it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=97.147, player_2/loss=117.910, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 492.69it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=68.951, player_2/loss=140.655, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 490.95it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=93.410, player_2/loss=138.179, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 491.64it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=129.553, player_2/loss=147.670, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.69it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=89.557, player_2/loss=164.167, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 493.64it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=76.973, player_2/loss=239.957, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 482.27it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=108.178, player_2/loss=337.388, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 487.51it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=112.971, player_2/loss=530.513, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 491.21it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=47.705, player_2/loss=425.781, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 481.69it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=21.579, player_2/loss=421.999, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 488.72it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=28.495, player_2/loss=453.891, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 486.60it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=21.174, player_2/loss=485.102, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 479.32it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=18.083, player_2/loss=526.621, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 491.27it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=11.783, player_2/loss=472.840, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 492.84it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=52.738, player_2/loss=305.185, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.89it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=46.908, player_2/loss=213.944, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 485.53it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=48.387, player_2/loss=151.541, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.41it/s, env_step=4096, len=9, n/ep=5, n/st=64, player_1/loss=133.515, player_2/loss=144.969, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 494.24it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=193.124, player_2/loss=152.713, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 485.71it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=183.203, player_2/loss=191.119, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.46it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=246.753, player_2/loss=185.723, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 492.61it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=293.098, player_2/loss=124.211, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 490.28it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=392.760, player_2/loss=71.488, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 495.38it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=497.952, player_2/loss=33.425, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 489.77it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=424.283, player_2/loss=61.295, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 482.73it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=308.067, player_2/loss=165.233, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.59it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=384.450, player_2/loss=170.799, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 489.61it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=469.768, player_2/loss=54.130, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 479.04it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=443.052, player_2/loss=42.255, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 492.04it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=402.324, player_2/loss=12.673, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.52it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=404.586, player_2/loss=14.074, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 491.06it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=394.084, player_2/loss=27.956, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 491.87it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=448.541, player_2/loss=25.108, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 489.23it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=268.429, player_2/loss=50.437, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.76it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=209.244, player_2/loss=182.309, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 489.96it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=101.573, player_2/loss=375.369, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 490.83it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=49.054, player_2/loss=466.459, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 492.36it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=103.784, player_2/loss=374.365, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 485.56it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=111.283, player_2/loss=407.607, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.12it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=35.255, player_2/loss=430.397, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 491.54it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=12.817, player_2/loss=480.849, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 481.20it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=4.702, player_2/loss=496.458, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 486.96it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=43.698, player_2/loss=482.196, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 493.14it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=43.551, player_2/loss=571.754, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 489.28it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=33.697, player_2/loss=546.479, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 483.93it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=42.870, player_2/loss=456.949, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 491.23it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=45.224, player_2/loss=502.456, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 489.43it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=39.286, player_2/loss=509.511, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 481.97it/s, env_step=16384, len=9, n/ep=6, n/st=64, player_1/loss=15.696, player_2/loss=469.691, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 489.59it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=14.478, player_2/loss=436.100, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 477.29it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=3.559, rew=17.86]          


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 484.16it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=13.523, player_2/loss=476.549, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 489.57it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=54.796, player_2/loss=275.361, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.22it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=388.999, player_2/loss=222.268, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 492.55it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=652.931, player_2/loss=127.791, rew=18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.51it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=619.702, player_2/loss=93.357, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 490.91it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=631.382, player_2/loss=78.110, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 488.60it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=665.431, player_2/loss=76.203, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 482.91it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=640.323, player_2/loss=93.298, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.33it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=741.007, player_2/loss=51.415, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 489.23it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=791.898, player_2/loss=13.434, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 484.18it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=652.737, player_2/loss=16.057, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.65it/s, env_step=11264, len=9, n/ep=8, n/st=64, player_1/loss=566.553, player_2/loss=101.404, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 492.49it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=608.857, player_2/loss=141.773, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 484.40it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=633.401, player_2/loss=80.583, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 491.63it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=667.573, player_2/loss=56.361, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 488.46it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=708.703, player_2/loss=32.771, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 488.91it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=617.336, player_2/loss=9.944, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 490.16it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=638.386, player_2/loss=49.439, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 489.02it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=584.629, player_2/loss=97.904, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 485.15it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=549.395, player_2/loss=102.667, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 487.77it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=338.630, player_2/loss=372.489, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.76it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=182.290, player_2/loss=333.443, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 490.22it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=56.560, player_2/loss=341.002, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 486.33it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=74.564, player_2/loss=330.698, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 487.72it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=45.437, player_2/loss=275.032, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 489.49it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=41.088, player_2/loss=298.717, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 483.09it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=28.521, player_2/loss=301.187, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 488.87it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=35.106, player_2/loss=329.381, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.72it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=36.309, player_2/loss=305.960, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 489.14it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=34.049, player_2/loss=330.185, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 479.97it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=29.909, player_2/loss=283.460, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 489.86it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=28.088, player_2/loss=269.827, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 486.23it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=45.875, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 490.95it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=14.442, player_2/loss=314.175, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 478.78it/s, env_step=15360, len=15, n/ep=5, n/st=64, player_1/loss=16.564, player_2/loss=313.277, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 488.08it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=15.857, player_2/loss=325.338, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 486.53it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=24.931, player_2/loss=337.389, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 482.41it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=24.623, player_2/loss=276.908, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 489.82it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=25.391, player_2/loss=263.835, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 490.83it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=9.419, player_2/loss=250.008, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 485.80it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=43.411, player_2/loss=162.613, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 490.39it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=102.503, player_2/loss=81.704, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 490.01it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=111.557, player_2/loss=80.345, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 485.68it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=83.585, player_2/loss=130.824, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 491.40it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=88.315, player_2/loss=119.134, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 496.72it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=118.847, rew=8.33]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 487.27it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=151.301, player_2/loss=82.154, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 491.94it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=144.242, player_2/loss=53.773, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.12it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=134.218, player_2/loss=100.025, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 486.86it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=154.625, player_2/loss=91.823, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 494.47it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=158.365, player_2/loss=57.068, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 492.26it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=154.403, player_2/loss=74.115, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 485.99it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=123.851, player_2/loss=99.559, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 494.75it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=121.749, player_2/loss=80.320, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 497.32it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=145.873, player_2/loss=62.729, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 483.51it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=120.222, player_2/loss=28.107, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 495.01it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=108.102, player_2/loss=31.213, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 492.37it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=118.909, player_2/loss=52.453, rew=-8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 481.98it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=75.297, player_2/loss=497.791, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.67it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=103.341, player_2/loss=373.707, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.87it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=88.937, player_2/loss=257.753, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.92it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=90.754, player_2/loss=180.518, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 489.97it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=96.038, player_2/loss=180.269, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 489.09it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=53.780, player_2/loss=197.094, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 490.03it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=86.080, player_2/loss=180.369, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 487.87it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=61.026, player_2/loss=150.296, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 496.87it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=23.387, player_2/loss=140.368, rew=15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 489.08it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=48.437, player_2/loss=179.481, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 480.49it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=33.372, player_2/loss=222.266, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 492.92it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=74.674, player_2/loss=219.992, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.26it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=84.246, player_2/loss=218.583, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.65it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=18.657, player_2/loss=168.964, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.89it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=6.949, player_2/loss=193.489, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 493.96it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=17.425, player_2/loss=195.383, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 490.74it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=20.321, player_2/loss=189.445, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 485.29it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=9.209, player_2/loss=173.171, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.74it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=16.812, player_2/loss=215.735, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 484.78it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=38.585, player_2/loss=275.833, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 481.19it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=101.822, player_2/loss=274.323, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.61it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=153.167, player_2/loss=227.813, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 491.55it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=99.816, player_2/loss=173.419, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 482.28it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=52.800, player_2/loss=130.635, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 490.09it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=56.714, player_2/loss=100.697, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.51it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=53.590, player_2/loss=100.873, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.57it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=47.059, player_2/loss=108.875, rew=-13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 482.51it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=67.091, player_2/loss=79.542, rew=-13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 480.93it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=105.209, player_2/loss=131.070, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.47it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=159.398, player_2/loss=206.712, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 481.13it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=176.428, player_2/loss=204.751, rew=-13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.53it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=134.033, player_2/loss=136.993, rew=-13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.38it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=113.585, player_2/loss=159.177, rew=-13.89]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 483.50it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=83.464, player_2/loss=129.818, rew=-19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 491.42it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=113.278, player_2/loss=63.549, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 490.35it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=99.630, player_2/loss=47.714, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 490.52it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=87.037, player_2/loss=52.024, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 484.58it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=199.343, player_2/loss=109.587, rew=-17.86]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.33it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=91.562, player_2/loss=127.557, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.28it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=100.577, player_2/loss=125.452, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 485.52it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=138.977, player_2/loss=98.974, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 481.60it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=132.994, player_2/loss=73.259, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 467.51it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=68.963, player_2/loss=63.632, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 483.42it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=65.258, player_2/loss=75.223, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 483.24it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=98.585, player_2/loss=141.512, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.29it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=118.635, player_2/loss=159.357, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 488.23it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=130.788, player_2/loss=99.440, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 489.42it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=127.542, player_2/loss=79.825, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 483.38it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=77.339, player_2/loss=130.364, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 486.11it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=34.621, player_2/loss=150.061, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 487.36it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=46.698, player_2/loss=160.543, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 486.47it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=26.484, player_2/loss=175.448, rew=6.25]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 482.71it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=6.936, player_2/loss=128.028, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 483.43it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=83.671, player_2/loss=74.831, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 483.98it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=75.353, player_2/loss=112.160, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 485.90it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=77.429, player_2/loss=158.873, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 484.73it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=87.505, player_2/loss=193.793, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 484.17it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=34.340, player_2/loss=121.823, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.96it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=55.469, player_2/loss=142.154, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 482.05it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=92.744, player_2/loss=132.463, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 486.80it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=81.241, player_2/loss=116.901, rew=-17.86]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.02it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=37.664, player_2/loss=100.587, rew=-17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 483.70it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=31.059, player_2/loss=80.145, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 487.27it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=44.262, player_2/loss=69.695, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 487.34it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=73.686, player_2/loss=71.871, rew=-18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 494.54it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=115.894, player_2/loss=68.861, rew=-3.57]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 479.79it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=101.380, player_2/loss=92.878, rew=-17.86]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 487.41it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=58.298, player_2/loss=101.524, rew=-13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 484.80it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=75.082, player_2/loss=108.845, rew=-19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 485.45it/s, env_step=13312, len=7, n/ep=7, n/st=64, player_1/loss=56.469, player_2/loss=88.392, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 492.11it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=49.998, player_2/loss=123.344, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 486.36it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=97.565, player_2/loss=196.336, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 483.63it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=146.761, player_2/loss=202.897, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 491.53it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=186.680, player_2/loss=202.725, rew=5.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 487.33it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=222.391, player_2/loss=186.506, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 481.97it/s, env_step=19456, len=12, n/ep=4, n/st=64, player_1/loss=203.973, player_2/loss=172.147, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 487.89it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=129.688, player_2/loss=194.403, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 484.12it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=144.195, player_2/loss=154.011, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 485.07it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_2/loss=192.901, rew=-12.50]         


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.11it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=198.972, player_2/loss=236.693, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.95it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_2/loss=279.206, rew=25.00]          


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.70it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=162.648, player_2/loss=345.100, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 466.06it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=145.987, player_2/loss=341.412, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.51it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=101.598, player_2/loss=305.177, rew=6.25]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 473.63it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=77.528, player_2/loss=301.290, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 466.32it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=83.601, player_2/loss=325.448, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.77it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=58.675, player_2/loss=327.413, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 486.16it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=51.440, player_2/loss=273.758, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 478.23it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=168.320, player_2/loss=252.603, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 485.72it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=173.431, player_2/loss=236.184, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 487.04it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=78.426, player_2/loss=244.152, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 481.02it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=61.309, player_2/loss=310.446, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 488.07it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=87.172, player_2/loss=334.293, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 484.24it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=102.512, player_2/loss=311.475, rew=6.25]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 485.77it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=80.864, player_2/loss=306.774, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 480.99it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=95.570, player_2/loss=270.439, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.07it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=72.081, player_2/loss=247.380, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 486.37it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=80.724, player_2/loss=216.317, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 480.55it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=89.776, player_2/loss=181.450, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 488.37it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=67.758, player_2/loss=149.405, rew=-19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 489.89it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=74.875, player_2/loss=183.848, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 480.59it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=148.054, player_2/loss=168.125, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 483.88it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=108.268, player_2/loss=163.317, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 491.32it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=70.581, player_2/loss=131.290, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 485.85it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=122.988, player_2/loss=158.667, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 483.95it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=129.476, player_2/loss=197.049, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 487.28it/s, env_step=12288, len=28, n/ep=3, n/st=64, player_1/loss=169.219, player_2/loss=194.908, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 488.10it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=175.047, player_2/loss=123.288, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 482.05it/s, env_step=14336, len=30, n/ep=2, n/st=64, player_1/loss=96.034, player_2/loss=73.364, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 489.61it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=70.913, player_2/loss=94.140, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 488.51it/s, env_step=16384, len=22, n/ep=3, n/st=64, player_1/loss=120.255, player_2/loss=100.310, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 491.53it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=175.210, player_2/loss=89.971, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 486.30it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=160.562, player_2/loss=93.709, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 487.97it/s, env_step=19456, len=25, n/ep=2, n/st=64, player_1/loss=111.327, player_2/loss=113.296, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 493.29it/s, env_step=1024, len=26, n/ep=3, n/st=64, player_1/loss=73.182, player_2/loss=131.666, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.88it/s, env_step=2048, len=25, n/ep=3, n/st=64, player_1/loss=68.305, player_2/loss=80.224, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.30it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=92.336, player_2/loss=54.339, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.21it/s, env_step=4096, len=30, n/ep=2, n/st=64, player_1/loss=67.986, player_2/loss=46.061, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 483.91it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=37.557, player_2/loss=72.917, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 491.24it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=111.135, player_2/loss=84.981, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 488.73it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=128.588, player_2/loss=170.382, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.70it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=80.438, player_2/loss=204.454, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 481.50it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=63.582, player_2/loss=217.237, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 488.56it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=79.501, player_2/loss=208.863, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 491.45it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=58.046, player_2/loss=204.432, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 483.37it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=32.573, player_2/loss=226.742, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 487.54it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=34.947, player_2/loss=293.418, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 486.78it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=59.623, player_2/loss=277.775, rew=2.78]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 480.28it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=75.056, player_2/loss=221.288, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 490.27it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=99.300, player_2/loss=254.430, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 489.22it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=67.052, player_2/loss=262.905, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 481.93it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=126.009, player_2/loss=241.304, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 482.46it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=37.204, player_2/loss=240.143, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.53it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=6.486, player_2/loss=246.136, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.28it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=69.382, player_2/loss=222.942, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 481.61it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=109.181, player_2/loss=231.321, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 489.76it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=182.859, player_2/loss=208.791, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 493.58it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=142.054, player_2/loss=164.546, rew=-6.25]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 492.95it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=52.264, player_2/loss=148.239, rew=-5.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 479.72it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=54.694, player_2/loss=110.588, rew=-18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 491.04it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=52.437, player_2/loss=80.123, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.00it/s, env_step=9216, len=7, n/ep=7, n/st=64, player_1/loss=93.244, player_2/loss=77.626, rew=-17.86]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 485.28it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=91.101, player_2/loss=90.929, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 492.87it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=47.657, player_2/loss=84.720, rew=-18.75]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 491.09it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=88.032, player_2/loss=72.251, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.17it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=245.353, player_2/loss=47.257, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 484.70it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=309.557, player_2/loss=23.453, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 488.09it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=282.023, player_2/loss=59.149, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 488.49it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_2/loss=52.673, rew=25.00]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 487.77it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=258.257, player_2/loss=22.050, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 491.85it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=314.980, player_2/loss=53.990, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 491.05it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=266.437, player_2/loss=71.874, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 491.74it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=174.125, player_2/loss=28.811, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.04it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=150.474, player_2/loss=33.249, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.64it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=130.284, player_2/loss=52.098, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.25it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=88.305, player_2/loss=108.615, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 484.18it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=58.318, player_2/loss=197.748, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 489.53it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=68.277, player_2/loss=206.024, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 485.66it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=65.533, player_2/loss=270.922, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 478.44it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=36.699, player_2/loss=310.877, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 488.94it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=32.283, player_2/loss=264.356, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 488.77it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=37.989, player_2/loss=250.084, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 492.37it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=25.173, player_2/loss=274.051, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 488.40it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=64.566, player_2/loss=264.251, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 487.32it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=163.830, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 485.78it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=116.111, player_2/loss=314.115, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 481.98it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=44.714, player_2/loss=332.276, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 486.47it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=103.744, player_2/loss=313.795, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 489.42it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=69.767, player_2/loss=304.156, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 479.59it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=71.295, player_2/loss=328.024, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 484.75it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=120.865, player_2/loss=334.820, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 493.76it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=122.883, player_2/loss=361.495, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 484.14it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=142.659, player_2/loss=231.972, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.88it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=190.380, rew=-25.00]        


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 494.00it/s, env_step=4096, len=21, n/ep=2, n/st=64, player_1/loss=159.322, player_2/loss=137.464, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 486.31it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=115.324, player_2/loss=124.921, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 490.45it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=151.424, player_2/loss=80.927, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 494.39it/s, env_step=7168, len=25, n/ep=2, n/st=64, player_1/loss=164.339, player_2/loss=64.597, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 490.36it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=160.701, player_2/loss=55.773, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 486.94it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=187.578, player_2/loss=76.695, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 492.31it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=211.234, player_2/loss=53.781, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 495.74it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=234.655, player_2/loss=44.511, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 484.23it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=250.040, player_2/loss=37.294, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 492.66it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=218.886, player_2/loss=61.025, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 490.60it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=208.252, player_2/loss=50.289, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 494.81it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=195.137, player_2/loss=56.944, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 487.15it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=226.602, player_2/loss=36.086, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 490.81it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=222.261, player_2/loss=35.130, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 491.92it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=180.446, player_2/loss=47.313, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 488.00it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=139.525, player_2/loss=29.212, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 485.52it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=129.914, player_2/loss=334.136, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.88it/s, env_step=2048, len=26, n/ep=3, n/st=64, player_1/loss=117.864, player_2/loss=295.125, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.92it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=97.651, player_2/loss=210.372, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 481.69it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=116.179, player_2/loss=163.472, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 487.47it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=146.060, player_2/loss=196.460, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 487.67it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=188.380, player_2/loss=218.265, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 486.04it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=168.310, player_2/loss=242.526, rew=19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 478.92it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=162.918, player_2/loss=296.761, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 486.17it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=131.478, player_2/loss=352.733, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 482.89it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=38.008, player_2/loss=368.167, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 481.03it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=37.201, player_2/loss=383.538, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 485.19it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=76.043, player_2/loss=385.006, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 485.74it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=134.671, player_2/loss=305.862, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 483.92it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=107.499, player_2/loss=283.334, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 486.69it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=14.344, player_2/loss=362.431, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 490.23it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=11.648, player_2/loss=331.539, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 484.98it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=22.097, player_2/loss=307.673, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 491.23it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=60.644, player_2/loss=325.100, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 485.74it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=50.168, player_2/loss=330.959, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 488.29it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=31.774, player_2/loss=202.040, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 486.47it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=67.099, player_2/loss=184.470, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.31it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=56.474, player_2/loss=151.337, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 488.43it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=11.231, player_2/loss=137.166, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 479.52it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=89.170, player_2/loss=131.222, rew=-16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 487.69it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=64.918, player_2/loss=107.941, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 488.09it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=36.321, player_2/loss=75.536, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 479.41it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=64.468, player_2/loss=54.272, rew=-5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 490.66it/s, env_step=9216, len=7, n/ep=6, n/st=64, player_1/loss=163.667, player_2/loss=81.051, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #10: 1025it [00:02, 486.00it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=257.595, player_2/loss=137.708, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #11: 1025it [00:02, 483.36it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=293.649, player_2/loss=108.965, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #12: 1025it [00:02, 489.30it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_2/loss=199.197, rew=12.50]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #13: 1025it [00:02, 486.33it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=312.243, player_2/loss=205.956, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #14: 1025it [00:02, 474.75it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=322.868, player_2/loss=127.557, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #15: 1025it [00:02, 466.73it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=341.062, player_2/loss=93.060, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #16: 1025it [00:02, 482.14it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=376.660, player_2/loss=39.566, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #17: 1025it [00:02, 480.77it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=355.871, player_2/loss=62.490, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #18: 1025it [00:02, 489.73it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=381.038, player_2/loss=71.484, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #19: 1025it [00:02, 488.53it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=383.357, player_2/loss=103.906, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #1: 1025it [00:02, 483.90it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=227.381, player_2/loss=39.145, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 446.62it/s, env_step=2048, len=27, n/ep=3, n/st=64, player_1/loss=159.794, player_2/loss=67.712, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 488.23it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=89.185, player_2/loss=147.768, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 476.19it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=89.149, player_2/loss=194.769, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 484.76it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=57.756, player_2/loss=155.429, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 500.74it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=51.668, player_2/loss=131.593, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 472.03it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_2/loss=222.441, rew=25.00]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 489.45it/s, env_step=8192, len=22, n/ep=2, n/st=64, player_1/loss=30.887, player_2/loss=227.414, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 472.20it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=68.418, player_2/loss=166.790, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 472.91it/s, env_step=10240, len=15, n/ep=3, n/st=64, player_1/loss=82.489, player_2/loss=169.265, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 485.72it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=38.563, player_2/loss=284.371, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 486.58it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=57.491, player_2/loss=481.604, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 486.29it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=69.081, player_2/loss=562.002, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 489.82it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=39.483, player_2/loss=612.375, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 488.18it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=11.264, player_2/loss=712.587, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 478.63it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=75.039, player_2/loss=750.497, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 484.52it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=111.534, player_2/loss=521.659, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 489.05it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=35.257, player_2/loss=479.008, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 484.26it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=22.127, player_2/loss=489.557, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 488.29it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=158.172, player_2/loss=365.323, rew=-16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 485.55it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=165.553, player_2/loss=326.426, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 491.18it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=114.209, player_2/loss=254.254, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 487.32it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=63.698, player_2/loss=201.020, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 491.40it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=84.713, player_2/loss=145.548, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 486.35it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=71.082, player_2/loss=139.546, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 491.64it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=40.374, player_2/loss=126.389, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 485.21it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=42.788, player_2/loss=118.311, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 492.01it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=35.280, player_2/loss=43.507, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 490.95it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=25.668, player_2/loss=29.132, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.49it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=21.836, player_2/loss=31.147, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 485.81it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=24.460, rew=-25.00]       


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 495.33it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=50.658, player_2/loss=55.190, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.59it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=92.470, player_2/loss=126.259, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 486.96it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=106.391, player_2/loss=143.936, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 489.29it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=151.090, player_2/loss=122.761, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.37it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=95.003, player_2/loss=106.767, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 483.79it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=102.836, player_2/loss=92.078, rew=-16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 485.87it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=79.679, player_2/loss=93.999, rew=-15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 485.37it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=25.091, player_2/loss=55.772, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.90it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=101.772, player_2/loss=64.591, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 482.21it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=119.827, player_2/loss=63.508, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 487.80it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=69.479, player_2/loss=89.198, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.58it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=43.298, player_2/loss=102.367, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 481.04it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=12.457, player_2/loss=59.535, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 493.24it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=20.977, player_2/loss=50.889, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 488.57it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=51.168, player_2/loss=54.497, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 485.54it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=59.600, player_2/loss=77.314, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 480.78it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=48.303, player_2/loss=81.152, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.50it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=71.574, player_2/loss=85.266, rew=16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 489.53it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=103.878, player_2/loss=120.105, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 487.79it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=165.833, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.36it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=125.004, player_2/loss=158.606, rew=3.57]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 489.84it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=118.393, player_2/loss=129.850, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 484.58it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=146.740, player_2/loss=127.582, rew=10.71]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 482.36it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=132.974, player_2/loss=112.759, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 486.95it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=101.286, player_2/loss=150.151, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 485.51it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=48.738, player_2/loss=107.279, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 482.89it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=156.848, player_2/loss=110.719, rew=3.57]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.39it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=111.720, player_2/loss=130.237, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.06it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=44.252, player_2/loss=143.276, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.86it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=36.237, player_2/loss=130.897, rew=-10.71]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 485.42it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=84.482, player_2/loss=138.384, rew=-16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 495.84it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=135.811, player_2/loss=168.961, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.40it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=225.887, player_2/loss=167.018, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 488.75it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=222.381, player_2/loss=172.965, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 492.13it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=227.245, player_2/loss=172.276, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 485.20it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=309.730, player_2/loss=132.261, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 489.88it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=324.682, player_2/loss=95.865, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 487.43it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=214.507, player_2/loss=75.929, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.31it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_2/loss=51.392, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 483.56it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=220.843, player_2/loss=53.615, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 493.22it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=144.310, player_2/loss=82.165, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 490.45it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=147.271, player_2/loss=82.081, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 483.30it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=235.470, player_2/loss=23.820, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 490.71it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=294.672, player_2/loss=16.806, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.91it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=285.294, player_2/loss=55.656, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 483.75it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=170.809, player_2/loss=30.301, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.04it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=149.886, player_2/loss=136.447, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 489.01it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=105.950, player_2/loss=317.579, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 483.32it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=60.432, player_2/loss=348.156, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 492.60it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=37.331, player_2/loss=432.246, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 487.16it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=51.669, player_2/loss=588.960, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 489.70it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=58.987, player_2/loss=578.435, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 484.79it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=46.230, player_2/loss=529.354, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.89it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=85.885, player_2/loss=461.376, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 491.73it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=95.944, player_2/loss=407.845, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 481.75it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=73.009, player_2/loss=398.852, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 492.49it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=45.343, player_2/loss=429.364, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 489.88it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=36.822, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 482.14it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=31.409, player_2/loss=478.698, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 490.75it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=9.334, player_2/loss=490.451, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 492.20it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=8.088, player_2/loss=537.616, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 487.96it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=11.677, player_2/loss=549.582, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 490.82it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=12.503, player_2/loss=463.485, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 496.77it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=20.137, player_2/loss=447.987, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 492.35it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=9.082, player_2/loss=318.060, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 486.97it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=8.797, player_2/loss=274.857, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.53it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=8.875, player_2/loss=212.433, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.04it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=28.794, player_2/loss=184.094, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 493.95it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=104.096, player_2/loss=201.930, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 489.25it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=133.702, player_2/loss=225.023, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 493.35it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=164.339, rew=-12.50]        


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 495.33it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=171.223, player_2/loss=160.166, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 492.28it/s, env_step=9216, len=24, n/ep=2, n/st=64, player_1/loss=168.269, player_2/loss=120.143, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 489.82it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=188.284, player_2/loss=124.809, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 490.38it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=155.207, player_2/loss=95.561, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 491.59it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=171.520, player_2/loss=110.311, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 486.77it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=217.659, player_2/loss=78.752, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 490.73it/s, env_step=14336, len=22, n/ep=2, n/st=64, player_1/loss=217.852, player_2/loss=46.207, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 493.64it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=235.443, player_2/loss=46.535, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 493.26it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=282.537, player_2/loss=91.212, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 487.30it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_2/loss=83.983, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 493.28it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=215.923, player_2/loss=54.083, rew=-17.86]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 491.90it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=251.434, player_2/loss=57.997, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 484.55it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=169.306, player_2/loss=142.962, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.34it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=157.612, player_2/loss=124.181, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.41it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=111.035, player_2/loss=139.587, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 487.62it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=64.245, player_2/loss=143.452, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 497.89it/s, env_step=5120, len=28, n/ep=2, n/st=64, player_1/loss=70.272, player_2/loss=124.716, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 492.24it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=82.596, player_2/loss=111.182, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 495.63it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=122.242, player_2/loss=185.146, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 494.52it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=132.628, player_2/loss=186.024, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 496.11it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=146.732, player_2/loss=125.307, rew=-15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 497.38it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=116.702, player_2/loss=150.929, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 492.24it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=98.112, player_2/loss=142.484, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 492.97it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=119.468, player_2/loss=182.740, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 498.52it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=105.943, player_2/loss=365.923, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 489.70it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=80.173, player_2/loss=444.205, rew=-5.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 492.57it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=128.325, player_2/loss=277.358, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 494.46it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=192.103, player_2/loss=215.720, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 498.19it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=142.788, player_2/loss=216.852, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 489.55it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=126.027, player_2/loss=159.846, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 497.64it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=106.288, player_2/loss=130.315, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 493.36it/s, env_step=1024, len=23, n/ep=2, n/st=64, player_1/loss=146.467, player_2/loss=71.653, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 485.25it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=110.114, player_2/loss=45.455, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 490.62it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=161.103, player_2/loss=22.592, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 496.07it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=165.578, player_2/loss=24.683, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 494.88it/s, env_step=5120, len=26, n/ep=3, n/st=64, player_1/loss=204.580, player_2/loss=54.509, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 488.23it/s, env_step=6144, len=25, n/ep=2, n/st=64, player_1/loss=158.149, player_2/loss=53.924, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.33it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=134.435, player_2/loss=25.928, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 498.15it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=153.807, player_2/loss=103.503, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 492.80it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=180.252, player_2/loss=165.017, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 499.95it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=259.820, player_2/loss=147.240, rew=10.71]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 494.61it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=319.662, player_2/loss=152.691, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 495.09it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=480.218, player_2/loss=121.496, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 489.85it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=411.213, player_2/loss=59.322, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.53it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=378.729, player_2/loss=62.383, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 494.69it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=491.700, player_2/loss=102.890, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 490.12it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=496.000, player_2/loss=88.407, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.24it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=551.266, player_2/loss=43.597, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 496.24it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=402.267, player_2/loss=18.986, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 498.72it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=403.273, player_2/loss=61.125, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 491.76it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=360.297, player_2/loss=47.284, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.58it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=313.998, player_2/loss=45.913, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.22it/s, env_step=3072, len=17, n/ep=5, n/st=64, player_1/loss=234.838, player_2/loss=63.170, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.14it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=156.363, player_2/loss=83.054, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.82it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=94.617, player_2/loss=138.647, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.85it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=56.355, player_2/loss=199.666, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 499.80it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=39.394, player_2/loss=204.225, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.50it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=28.499, player_2/loss=163.665, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 492.98it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=15.459, player_2/loss=183.013, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.85it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=23.084, player_2/loss=176.132, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 497.63it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=30.835, player_2/loss=210.591, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 482.68it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=17.714, player_2/loss=221.159, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 496.21it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=14.925, player_2/loss=179.205, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 499.17it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=20.304, player_2/loss=177.163, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 496.53it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=20.613, player_2/loss=229.269, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 490.38it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=13.318, player_2/loss=247.548, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 497.49it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=6.640, player_2/loss=265.593, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.74it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=4.820, player_2/loss=214.629, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 497.24it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=6.166, player_2/loss=197.209, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 480.70it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=4.185, player_2/loss=129.932, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.24it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=10.956, player_2/loss=97.053, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.39it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=10.765, player_2/loss=71.498, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 486.88it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=19.521, player_2/loss=67.676, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.04it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=22.871, player_2/loss=53.434, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.50it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=7.955, player_2/loss=44.831, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.14it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=8.343, player_2/loss=43.148, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.42it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=12.658, player_2/loss=28.669, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.32it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=122.949, player_2/loss=182.835, rew=5.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 496.61it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=153.686, player_2/loss=220.287, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 498.28it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=187.336, player_2/loss=108.934, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 486.75it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=137.317, player_2/loss=61.610, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 498.06it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=111.408, player_2/loss=77.648, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 499.86it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=113.558, player_2/loss=107.998, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 502.05it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=74.564, player_2/loss=92.138, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 489.99it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=95.239, player_2/loss=180.230, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 497.22it/s, env_step=17408, len=21, n/ep=4, n/st=64, player_1/loss=198.134, player_2/loss=173.321, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 496.78it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=250.144, player_2/loss=71.870, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 490.48it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=210.223, player_2/loss=65.607, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 495.72it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=122.432, player_2/loss=108.651, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.14it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=142.898, player_2/loss=134.491, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 496.97it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=141.415, player_2/loss=211.471, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 485.90it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=96.682, player_2/loss=285.930, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 491.35it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=76.638, player_2/loss=276.497, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 492.80it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=58.499, player_2/loss=280.490, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 488.92it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=62.106, player_2/loss=324.635, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 492.71it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=59.766, player_2/loss=373.516, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 497.51it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=39.212, player_2/loss=374.578, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 493.87it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=106.224, player_2/loss=300.140, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 485.24it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=71.512, player_2/loss=304.844, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 493.18it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=48.158, player_2/loss=303.311, rew=18.75]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.18it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=33.285, player_2/loss=299.197, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.14it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=33.961, player_2/loss=302.415, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 488.26it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=55.543, player_2/loss=379.213, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 492.02it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=61.902, player_2/loss=346.079, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 495.62it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=62.206, player_2/loss=312.699, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 488.25it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=18.940, player_2/loss=296.184, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 493.33it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=10.915, player_2/loss=360.023, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 492.31it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=7.253, player_2/loss=277.664, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 497.83it/s, env_step=2048, len=22, n/ep=2, n/st=64, player_1/loss=78.341, player_2/loss=186.911, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 486.14it/s, env_step=3072, len=20, n/ep=4, n/st=64, player_1/loss=206.683, player_2/loss=88.364, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.72it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=205.782, player_2/loss=52.638, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.57it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=159.878, player_2/loss=27.657, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.94it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=148.080, player_2/loss=39.880, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 492.48it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=153.188, player_2/loss=73.836, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 492.20it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=182.525, player_2/loss=101.454, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.43it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=170.590, player_2/loss=128.930, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 488.79it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=139.694, player_2/loss=86.052, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 494.95it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=149.783, player_2/loss=91.667, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.54it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=158.804, player_2/loss=124.718, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 496.68it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=162.474, player_2/loss=79.379, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 486.34it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=116.512, player_2/loss=63.179, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 494.49it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=100.088, player_2/loss=100.729, rew=-12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 493.29it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=99.530, player_2/loss=125.160, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.60it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=146.734, player_2/loss=74.314, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 484.09it/s, env_step=18432, len=24, n/ep=3, n/st=64, player_1/loss=175.354, player_2/loss=69.788, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 496.89it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=193.244, player_2/loss=64.677, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.26it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=105.951, player_2/loss=31.180, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.92it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=107.013, player_2/loss=40.696, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 476.97it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=95.476, player_2/loss=192.599, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 495.95it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=86.282, player_2/loss=249.361, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 497.94it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=61.955, player_2/loss=258.245, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 485.46it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=21.513, player_2/loss=158.763, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 498.38it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=18.284, player_2/loss=120.758, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 495.06it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=21.430, player_2/loss=146.394, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 493.07it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=86.090, player_2/loss=157.393, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 487.29it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=86.307, player_2/loss=166.017, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 474.58it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=39.064, player_2/loss=210.974, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 489.77it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=29.388, player_2/loss=263.705, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.48it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=10.355, player_2/loss=255.892, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 486.90it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=93.737, player_2/loss=178.601, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 497.63it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=127.838, player_2/loss=235.399, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 497.91it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=121.278, player_2/loss=270.712, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 491.42it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=64.451, player_2/loss=219.647, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 497.30it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=87.944, player_2/loss=194.965, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 498.22it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=86.097, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 495.09it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=91.772, player_2/loss=205.886, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.63it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=174.036, player_2/loss=139.411, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 494.95it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_2/loss=103.047, rew=0.00]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 496.56it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=125.621, player_2/loss=94.501, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 496.97it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=166.130, player_2/loss=26.544, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 489.87it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=133.005, player_2/loss=70.170, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 498.11it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=33.516, player_2/loss=129.478, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 496.75it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=26.968, player_2/loss=153.329, rew=-16.67]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 497.37it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=23.794, player_2/loss=136.983, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 488.51it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=45.288, player_2/loss=118.305, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 497.70it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=341.345, player_2/loss=133.103, rew=-16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 499.38it/s, env_step=12288, len=27, n/ep=3, n/st=64, player_1/loss=295.761, player_2/loss=126.659, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.42it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=112.411, player_2/loss=117.992, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 492.47it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=58.103, player_2/loss=109.748, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 495.23it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=117.847, player_2/loss=125.335, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 496.82it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=251.573, player_2/loss=144.512, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 499.46it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=230.689, player_2/loss=183.880, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 487.14it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=203.528, player_2/loss=126.704, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 500.83it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=252.057, player_2/loss=79.607, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 494.73it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=175.049, player_2/loss=121.977, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.14it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=176.092, player_2/loss=183.458, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.77it/s, env_step=3072, len=10, n/ep=8, n/st=64, player_1/loss=158.712, player_2/loss=214.341, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 495.13it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=121.625, player_2/loss=249.805, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 494.85it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=87.810, player_2/loss=314.651, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 499.02it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=78.506, player_2/loss=314.117, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 491.71it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=81.159, player_2/loss=311.429, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 493.17it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=83.583, player_2/loss=293.025, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 498.27it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=67.108, player_2/loss=283.888, rew=2.78]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 490.65it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=117.877, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 484.86it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=118.791, player_2/loss=266.101, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 493.18it/s, env_step=12288, len=8, n/ep=6, n/st=64, player_1/loss=69.463, rew=16.67]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 472.29it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=25.532, rew=25.00]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 487.53it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=26.757, player_2/loss=237.084, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 494.70it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=23.933, player_2/loss=236.604, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 496.05it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=17.027, player_2/loss=238.878, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 487.10it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=23.165, player_2/loss=248.636, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 493.40it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=7.412, player_2/loss=264.753, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 497.99it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=23.133, player_2/loss=261.908, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 494.16it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=45.524, player_2/loss=220.635, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 485.13it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=113.806, player_2/loss=163.989, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 494.83it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=158.340, player_2/loss=79.103, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 494.96it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=140.497, player_2/loss=42.955, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 490.41it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=138.088, player_2/loss=33.800, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 490.10it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=148.212, player_2/loss=20.807, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 492.41it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=167.063, player_2/loss=62.532, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 489.24it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=167.060, player_2/loss=53.355, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 499.28it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=132.759, player_2/loss=55.778, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 495.92it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=144.277, player_2/loss=28.766, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 492.82it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=166.995, player_2/loss=21.088, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 486.37it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=166.567, player_2/loss=7.548, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 496.87it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=156.165, player_2/loss=8.434, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 495.24it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=141.371, player_2/loss=17.362, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 496.61it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=127.303, player_2/loss=58.004, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 489.56it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=149.530, player_2/loss=51.436, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 497.90it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=157.192, player_2/loss=27.519, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 493.12it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_2/loss=21.489, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 491.94it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=145.577, player_2/loss=13.167, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 494.24it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=104.981, player_2/loss=39.460, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.18it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=94.345, player_2/loss=200.640, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 493.54it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=55.560, player_2/loss=275.445, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 500.83it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=52.787, player_2/loss=238.693, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 495.76it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=81.275, player_2/loss=178.809, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 497.45it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=81.099, player_2/loss=227.229, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 487.17it/s, env_step=7168, len=21, n/ep=4, n/st=64, player_1/loss=73.221, player_2/loss=252.777, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 497.25it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=44.156, player_2/loss=227.263, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 498.05it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=22.817, player_2/loss=177.797, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 493.18it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=33.029, player_2/loss=246.605, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 496.05it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=33.170, player_2/loss=280.080, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 493.73it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=35.991, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 488.65it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=19.310, player_2/loss=205.794, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.49it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=37.707, player_2/loss=263.629, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 494.24it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=51.150, player_2/loss=351.679, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 493.54it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=49.202, player_2/loss=287.569, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 496.77it/s, env_step=17408, len=24, n/ep=2, n/st=64, player_1/loss=93.953, player_2/loss=198.185, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 487.38it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=87.496, player_2/loss=196.627, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 494.20it/s, env_step=19456, len=15, n/ep=5, n/st=64, player_1/loss=34.374, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 496.59it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=8.182, player_2/loss=183.318, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.78it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=8.708, player_2/loss=146.285, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.66it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=17.930, player_2/loss=81.692, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.02it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=39.109, player_2/loss=89.516, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.22it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=31.230, player_2/loss=114.146, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.41it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=12.141, player_2/loss=132.327, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 496.65it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=35.132, player_2/loss=125.459, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.61it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=47.363, player_2/loss=83.347, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.29it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=53.432, player_2/loss=98.517, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 493.93it/s, env_step=10240, len=31, n/ep=2, n/st=64, player_1/loss=47.384, player_2/loss=70.762, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 500.41it/s, env_step=11264, len=21, n/ep=4, n/st=64, player_1/loss=23.189, player_2/loss=58.762, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 495.47it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=50.999, player_2/loss=63.714, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.54it/s, env_step=13312, len=30, n/ep=2, n/st=64, player_1/loss=67.401, player_2/loss=67.833, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 486.33it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=73.897, player_2/loss=52.046, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.89it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=90.599, player_2/loss=90.924, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 496.50it/s, env_step=16384, len=11, n/ep=7, n/st=64, player_1/loss=100.207, player_2/loss=112.779, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 498.98it/s, env_step=17408, len=26, n/ep=3, n/st=64, player_1/loss=81.726, player_2/loss=106.068, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.57it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=71.511, player_2/loss=86.175, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.15it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=123.027, player_2/loss=128.447, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.81it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=132.537, player_2/loss=136.151, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.56it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=117.239, player_2/loss=179.487, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.24it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=157.674, player_2/loss=220.858, rew=6.25]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.11it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=88.151, player_2/loss=236.214, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.16it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=122.810, player_2/loss=244.761, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 484.77it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=139.025, player_2/loss=241.044, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.15it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=135.241, player_2/loss=232.018, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.45it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=88.695, player_2/loss=249.214, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 492.05it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=114.603, player_2/loss=256.607, rew=10.71]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.21it/s, env_step=10240, len=11, n/ep=4, n/st=64, player_1/loss=198.505, player_2/loss=244.797, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 490.02it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=168.625, player_2/loss=248.876, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 495.58it/s, env_step=12288, len=9, n/ep=9, n/st=64, player_1/loss=79.306, rew=19.44]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 498.24it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=54.717, player_2/loss=186.611, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 490.11it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=74.660, player_2/loss=210.899, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.83it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=89.465, player_2/loss=202.265, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 486.10it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=19.649, player_2/loss=229.478, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 500.30it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=23.969, player_2/loss=212.983, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 493.67it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=35.251, player_2/loss=234.915, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.54it/s, env_step=19456, len=10, n/ep=5, n/st=64, player_1/loss=43.140, player_2/loss=223.836, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 492.43it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=161.835, player_2/loss=211.542, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.04it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=192.841, player_2/loss=213.866, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 484.81it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=190.558, player_2/loss=173.907, rew=-17.86]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.33it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_2/loss=145.600, rew=-8.33]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.06it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=147.434, player_2/loss=125.344, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.10it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=134.107, player_2/loss=101.886, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.88it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=118.455, player_2/loss=75.707, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 485.74it/s, env_step=8192, len=25, n/ep=3, n/st=64, player_1/loss=142.507, player_2/loss=122.261, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.81it/s, env_step=9216, len=25, n/ep=2, n/st=64, player_1/loss=146.392, player_2/loss=116.345, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 492.55it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=120.055, player_2/loss=82.020, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.82it/s, env_step=11264, len=26, n/ep=2, n/st=64, player_1/loss=149.230, player_2/loss=111.189, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 497.98it/s, env_step=12288, len=24, n/ep=2, n/st=64, player_1/loss=167.722, player_2/loss=129.438, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.39it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=126.206, player_2/loss=98.085, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 490.75it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=116.863, player_2/loss=72.633, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 492.03it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_2/loss=46.633, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 497.82it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=102.802, player_2/loss=51.091, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 493.84it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=185.599, player_2/loss=87.453, rew=17.86]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 491.50it/s, env_step=18432, len=24, n/ep=3, n/st=64, player_1/loss=252.617, player_2/loss=120.877, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 495.11it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=177.815, player_2/loss=98.832, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 496.68it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=54.726, player_2/loss=46.510, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 496.20it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=79.047, player_2/loss=152.405, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 497.41it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=80.315, player_2/loss=227.200, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.10it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=38.124, player_2/loss=206.169, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 495.66it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=38.965, player_2/loss=212.239, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 492.82it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=92.050, player_2/loss=179.022, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 499.69it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=89.925, rew=12.50]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 499.03it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=32.617, player_2/loss=162.549, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 487.86it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=14.724, player_2/loss=189.834, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 499.01it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=13.080, player_2/loss=240.187, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 494.68it/s, env_step=11264, len=13, n/ep=6, n/st=64, player_1/loss=26.498, player_2/loss=218.167, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 495.59it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=31.388, player_2/loss=197.451, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.87it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=18.981, player_2/loss=202.047, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 492.79it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=17.504, player_2/loss=200.416, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 495.00it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=15.388, player_2/loss=169.569, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 496.02it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=2.283, player_2/loss=198.267, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 493.55it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=55.686, player_2/loss=208.509, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.21it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=80.755, player_2/loss=189.807, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 495.59it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=20.665, player_2/loss=176.747, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 483.63it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=7.228, player_2/loss=149.430, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.16it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=11.141, player_2/loss=127.975, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.97it/s, env_step=3072, len=13, n/ep=6, n/st=64, player_1/loss=15.005, player_2/loss=113.150, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.98it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=12.552, player_2/loss=92.479, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.73it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=6.864, player_2/loss=75.864, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 485.56it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=8.762, player_2/loss=76.945, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 493.01it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=9.474, player_2/loss=63.096, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 499.03it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=39.556, player_2/loss=98.033, rew=5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 494.37it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=127.150, player_2/loss=158.476, rew=-3.57]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 489.55it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=327.215, player_2/loss=182.912, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 485.06it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=507.034, player_2/loss=142.015, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:02, 491.89it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=442.352, player_2/loss=129.441, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:02, 490.38it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=373.917, player_2/loss=110.809, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:02, 495.04it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=375.450, player_2/loss=132.631, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:02, 491.65it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=433.304, player_2/loss=101.247, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:02, 485.58it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=492.602, player_2/loss=37.153, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:02, 489.58it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=495.239, player_2/loss=58.002, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:02, 494.68it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=502.592, player_2/loss=118.378, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:02, 485.17it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=497.548, player_2/loss=118.775, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:02, 489.39it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=400.310, player_2/loss=55.734, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.40it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=266.813, player_2/loss=177.568, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 459.90it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=85.719, player_2/loss=340.931, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 485.00it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=46.754, player_2/loss=419.229, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 483.08it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=69.828, player_2/loss=482.910, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 490.81it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=51.015, player_2/loss=496.279, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 491.14it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=46.971, player_2/loss=567.426, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 483.08it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=39.409, player_2/loss=505.174, rew=13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.92it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=112.911, player_2/loss=371.646, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 490.68it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=134.611, player_2/loss=420.635, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.46it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=73.567, player_2/loss=521.761, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 489.51it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=111.521, player_2/loss=474.276, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 480.73it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=113.561, player_2/loss=454.240, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 489.21it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=36.632, player_2/loss=601.728, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 487.72it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=14.569, player_2/loss=539.118, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 491.84it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=13.796, player_2/loss=483.190, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 490.18it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=53.936, player_2/loss=436.252, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 483.67it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=58.317, player_2/loss=492.241, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 490.98it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=12.286, player_2/loss=531.738, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 497.32it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=18.425, player_2/loss=371.745, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.18it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=137.597, player_2/loss=270.682, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 499.49it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=366.456, player_2/loss=197.005, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 498.78it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=516.979, player_2/loss=135.282, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 492.56it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=624.481, player_2/loss=87.271, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 496.60it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=589.970, player_2/loss=42.263, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 496.59it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=568.232, player_2/loss=79.811, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 498.81it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=524.005, player_2/loss=102.162, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 497.56it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=275.416, player_2/loss=180.281, rew=50.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 482.71it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=265.450, player_2/loss=167.860, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 498.62it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=516.329, player_2/loss=183.054, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 497.92it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=528.457, player_2/loss=71.159, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 499.11it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=557.335, player_2/loss=43.971, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 502.40it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=540.599, player_2/loss=86.553, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 490.93it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=465.627, player_2/loss=75.532, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 498.50it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=529.842, player_2/loss=37.137, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 497.64it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=597.353, player_2/loss=28.635, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 497.01it/s, env_step=18432, len=10, n/ep=5, n/st=64, player_1/loss=465.066, player_2/loss=63.101, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 501.51it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=351.600, player_2/loss=92.283, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 486.23it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=296.424, player_2/loss=40.741, rew=-16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.08it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=229.489, player_2/loss=110.392, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.97it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=122.811, player_2/loss=234.315, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.63it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=95.266, player_2/loss=253.716, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 502.11it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=56.852, player_2/loss=248.868, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.85it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=72.094, player_2/loss=229.154, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.72it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=119.098, player_2/loss=251.232, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.66it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=69.056, player_2/loss=284.784, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 493.56it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=65.809, player_2/loss=302.278, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.83it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=69.788, player_2/loss=272.129, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.55it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=72.929, player_2/loss=247.730, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 492.99it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=74.438, player_2/loss=242.215, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 493.76it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=59.300, player_2/loss=240.488, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 494.80it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=42.766, player_2/loss=251.637, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.85it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=64.342, player_2/loss=231.749, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 487.27it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=107.061, player_2/loss=241.651, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.37it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=48.535, player_2/loss=256.501, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 497.88it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=39.137, player_2/loss=234.921, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 497.80it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=45.835, player_2/loss=222.456, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 494.97it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=35.014, player_2/loss=223.962, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.60it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=24.884, player_2/loss=182.119, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.35it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=45.489, player_2/loss=129.749, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.14it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_2/loss=76.486, rew=-5.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 497.60it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=103.806, player_2/loss=99.790, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 495.70it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=166.446, player_2/loss=88.634, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 490.81it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=193.865, player_2/loss=61.770, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 497.03it/s, env_step=8192, len=26, n/ep=2, n/st=64, player_1/loss=153.165, player_2/loss=93.073, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 499.19it/s, env_step=9216, len=26, n/ep=2, n/st=64, player_1/loss=117.765, player_2/loss=98.634, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 495.23it/s, env_step=10240, len=28, n/ep=2, n/st=64, player_1/loss=251.530, player_2/loss=155.479, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 496.69it/s, env_step=11264, len=27, n/ep=2, n/st=64, player_1/loss=303.738, player_2/loss=157.402, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 489.11it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=136.763, player_2/loss=60.949, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 495.04it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=108.636, player_2/loss=69.994, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 501.45it/s, env_step=14336, len=24, n/ep=3, n/st=64, player_1/loss=86.419, player_2/loss=120.971, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 494.02it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=82.899, player_2/loss=125.627, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 497.24it/s, env_step=16384, len=25, n/ep=3, n/st=64, player_1/loss=122.498, player_2/loss=103.945, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 488.54it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=116.997, player_2/loss=88.842, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 494.01it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=101.127, player_2/loss=81.987, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 498.04it/s, env_step=19456, len=16, n/ep=5, n/st=64, player_1/loss=156.301, player_2/loss=92.488, rew=-15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 488.53it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=42.932, player_2/loss=137.552, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.71it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=40.136, player_2/loss=167.969, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 483.97it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=27.894, player_2/loss=167.913, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.43it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=23.043, player_2/loss=144.328, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 495.18it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=40.901, player_2/loss=145.979, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 493.22it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=39.077, player_2/loss=152.199, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.04it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=17.770, player_2/loss=145.355, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 484.80it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=10.427, player_2/loss=136.897, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.92it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=4.449, player_2/loss=159.702, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.73it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=6.448, player_2/loss=170.492, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 487.70it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=24.111, player_2/loss=163.064, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 489.51it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=25.476, player_2/loss=164.321, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 482.43it/s, env_step=13312, len=9, n/ep=6, n/st=64, player_1/loss=16.538, player_2/loss=162.454, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.77it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=26.341, player_2/loss=161.762, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 490.16it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=18.303, player_2/loss=136.836, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.46it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=8.347, player_2/loss=147.579, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.90it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=21.496, player_2/loss=143.387, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 485.85it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=24.930, player_2/loss=112.553, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 497.81it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_2/loss=123.070, rew=25.00]       


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 492.22it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=19.664, player_2/loss=90.983, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.95it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=28.330, player_2/loss=81.728, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.85it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=83.156, player_2/loss=88.925, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.48it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=130.560, player_2/loss=112.119, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.87it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=92.268, player_2/loss=96.153, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.30it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=90.274, player_2/loss=159.571, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.18it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=93.058, player_2/loss=152.340, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 499.97it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=119.180, player_2/loss=106.897, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 487.73it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=321.568, player_2/loss=123.171, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 491.75it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=458.223, player_2/loss=119.626, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 494.02it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=410.232, player_2/loss=71.427, rew=17.86]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 494.97it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=365.780, player_2/loss=82.099, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 496.87it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=417.682, player_2/loss=69.471, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 483.96it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=421.969, player_2/loss=64.862, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 494.92it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=332.590, player_2/loss=79.684, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 492.17it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=333.254, player_2/loss=68.744, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 496.14it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=392.527, player_2/loss=40.260, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 496.78it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=395.861, player_2/loss=17.373, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 488.06it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=504.834, player_2/loss=3.465, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 489.64it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=269.831, player_2/loss=125.072, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.38it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=233.192, player_2/loss=79.082, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.55it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=210.994, player_2/loss=37.475, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.36it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=174.245, player_2/loss=44.634, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.98it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=147.815, player_2/loss=84.223, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.99it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=203.364, player_2/loss=220.496, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.80it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=256.187, rew=-10.71]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 493.95it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=158.340, player_2/loss=497.715, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 494.45it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=102.256, player_2/loss=541.161, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 481.25it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=79.273, player_2/loss=566.779, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 489.61it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=96.130, player_2/loss=595.633, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 495.73it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=98.029, player_2/loss=555.302, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.81it/s, env_step=13312, len=9, n/ep=9, n/st=64, player_1/loss=86.444, player_2/loss=520.744, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 493.71it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=53.685, player_2/loss=532.899, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 480.48it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=46.535, rew=13.89]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 486.71it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=71.259, player_2/loss=622.677, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 496.64it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=15.641, player_2/loss=625.179, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.64it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=93.223, player_2/loss=598.142, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 491.56it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=111.369, player_2/loss=518.821, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 484.40it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=27.840, player_2/loss=345.900, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 499.87it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=72.339, player_2/loss=351.282, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 501.01it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=92.787, player_2/loss=294.452, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 499.47it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=135.555, player_2/loss=187.744, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 496.20it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=232.268, player_2/loss=167.833, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 486.89it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=274.074, player_2/loss=124.747, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 500.36it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=316.590, rew=0.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 498.67it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=374.073, player_2/loss=59.212, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 499.56it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=257.796, player_2/loss=60.444, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 498.12it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=316.019, player_2/loss=24.234, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 488.71it/s, env_step=11264, len=10, n/ep=5, n/st=64, player_1/loss=385.059, player_2/loss=23.891, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 495.96it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=318.317, player_2/loss=32.200, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.62it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=310.324, player_2/loss=25.193, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 495.40it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=346.163, player_2/loss=13.641, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 497.38it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=309.272, player_2/loss=7.176, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 492.70it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=311.351, player_2/loss=13.444, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 498.99it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=320.778, player_2/loss=11.882, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 497.74it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=338.337, player_2/loss=7.321, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 498.19it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=318.698, player_2/loss=19.287, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 489.58it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=86.926, player_2/loss=122.641, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 485.21it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=50.038, player_2/loss=169.775, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 497.88it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=42.449, player_2/loss=241.088, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.27it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_2/loss=313.285, rew=25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 497.56it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=60.670, player_2/loss=261.216, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.60it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=63.901, player_2/loss=217.084, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 486.51it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=59.549, player_2/loss=226.660, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 490.71it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=43.186, player_2/loss=236.710, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 493.25it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=13.007, player_2/loss=291.298, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 488.49it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=47.392, player_2/loss=312.474, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.07it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=48.966, player_2/loss=311.131, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 485.27it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=10.731, player_2/loss=261.522, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.26it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=7.070, player_2/loss=255.848, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 492.86it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=5.707, player_2/loss=224.116, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 495.25it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=4.425, player_2/loss=222.168, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 494.45it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=5.362, player_2/loss=275.853, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 486.71it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=4.086, player_2/loss=259.068, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.58it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=33.684, player_2/loss=239.062, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.54it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=41.034, player_2/loss=220.450, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 489.97it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=95.906, player_2/loss=131.484, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.13it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=229.421, player_2/loss=104.270, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 486.53it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=295.562, player_2/loss=77.263, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 499.28it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=233.868, player_2/loss=61.155, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 497.09it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=138.170, player_2/loss=59.731, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 500.33it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=192.092, player_2/loss=33.997, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.29it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=257.966, player_2/loss=44.654, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 490.01it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=212.743, player_2/loss=58.002, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.21it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=224.018, player_2/loss=68.292, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 497.81it/s, env_step=10240, len=11, n/ep=4, n/st=64, player_1/loss=227.214, player_2/loss=48.542, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 495.30it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=187.801, player_2/loss=27.099, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 495.50it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=169.830, player_2/loss=37.388, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 487.88it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=202.565, player_2/loss=68.462, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.22it/s, env_step=14336, len=12, n/ep=4, n/st=64, player_1/loss=198.940, player_2/loss=108.119, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 499.02it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=198.430, player_2/loss=75.819, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 494.06it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=254.638, player_2/loss=43.401, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.81it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=294.160, player_2/loss=55.030, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 489.11it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=296.860, player_2/loss=36.891, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 499.46it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=252.520, player_2/loss=61.500, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 488.85it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=182.336, player_2/loss=115.417, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 487.48it/s, env_step=2048, len=27, n/ep=3, n/st=64, player_1/loss=177.446, player_2/loss=108.773, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.48it/s, env_step=3072, len=33, n/ep=1, n/st=64, player_1/loss=136.969, player_2/loss=133.105, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 481.23it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=79.050, player_2/loss=99.844, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 487.38it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=61.228, player_2/loss=68.564, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 488.40it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=61.495, player_2/loss=90.085, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 490.49it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=64.719, player_2/loss=121.464, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 490.60it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=59.678, player_2/loss=130.255, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 491.19it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=114.107, player_2/loss=166.716, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 490.33it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=144.590, player_2/loss=181.114, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 486.30it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=86.827, player_2/loss=259.209, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 491.61it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=89.234, player_2/loss=298.439, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 490.10it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=107.328, player_2/loss=353.806, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 486.32it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=112.001, player_2/loss=359.564, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 489.13it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=54.221, player_2/loss=365.717, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 487.77it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=34.885, player_2/loss=322.968, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 489.90it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=33.775, player_2/loss=352.132, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 491.18it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=61.173, player_2/loss=378.296, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 479.85it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=54.714, rew=25.00]         


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 492.32it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=60.120, player_2/loss=330.981, rew=-6.25]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.88it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=28.860, rew=-25.00]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.29it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=65.039, player_2/loss=242.673, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.70it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=71.415, player_2/loss=245.635, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 488.70it/s, env_step=5120, len=7, n/ep=10, n/st=64, player_1/loss=55.931, player_2/loss=214.412, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.30it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=43.786, player_2/loss=194.210, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 493.26it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=36.539, player_2/loss=121.600, rew=-10.71]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 492.57it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=54.333, player_2/loss=172.316, rew=-13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.93it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=102.080, player_2/loss=225.801, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 484.65it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=90.530, player_2/loss=178.903, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.26it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=67.455, player_2/loss=196.506, rew=-3.57]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 488.31it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=68.339, player_2/loss=228.744, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.84it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=185.179, player_2/loss=184.705, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.65it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=254.906, player_2/loss=127.015, rew=15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 477.99it/s, env_step=15360, len=12, n/ep=4, n/st=64, player_1/loss=242.279, player_2/loss=81.845, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 489.16it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=234.704, player_2/loss=81.118, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 489.22it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=253.809, player_2/loss=134.448, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.67it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=296.379, player_2/loss=163.255, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 495.00it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=265.862, player_2/loss=157.829, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 483.01it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=63.016, player_2/loss=212.220, rew=16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.37it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=43.953, player_2/loss=187.872, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.73it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=75.709, player_2/loss=220.163, rew=16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 488.55it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=123.655, player_2/loss=251.563, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 492.61it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=176.469, player_2/loss=297.340, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 482.20it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=246.060, player_2/loss=285.068, rew=-10.71]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 487.32it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=182.124, player_2/loss=295.487, rew=-16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 493.45it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=172.034, player_2/loss=254.156, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.59it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=151.423, player_2/loss=252.557, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 483.94it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_2/loss=255.178, rew=13.89]        


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 484.64it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=88.076, player_2/loss=282.099, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 480.98it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=46.569, player_2/loss=286.808, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.14it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=95.194, player_2/loss=316.751, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.62it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=96.728, player_2/loss=279.765, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 491.40it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=28.208, player_2/loss=271.518, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 479.23it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=29.794, player_2/loss=245.944, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 475.92it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=19.151, player_2/loss=242.467, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 484.68it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=72.626, player_2/loss=280.420, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 480.19it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=102.495, player_2/loss=268.735, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 484.88it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=28.006, player_2/loss=191.507, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 451.86it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=27.520, player_2/loss=171.493, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.37it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=191.768, player_2/loss=139.069, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 491.24it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=337.685, player_2/loss=116.843, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 497.76it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=209.619, player_2/loss=115.355, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 498.05it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=246.194, player_2/loss=148.006, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 488.29it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=276.119, player_2/loss=200.261, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 495.79it/s, env_step=8192, len=26, n/ep=2, n/st=64, player_1/loss=257.238, player_2/loss=152.453, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 498.40it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=182.063, player_2/loss=98.673, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 497.75it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=253.623, player_2/loss=121.962, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 494.46it/s, env_step=11264, len=9, n/ep=5, n/st=64, player_1/loss=371.424, player_2/loss=136.063, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 494.84it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=317.514, player_2/loss=96.878, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 496.95it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=329.073, player_2/loss=96.461, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 492.47it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=352.481, player_2/loss=123.956, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 495.49it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=340.375, player_2/loss=113.791, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 494.61it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=329.743, player_2/loss=93.153, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 488.45it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=262.379, player_2/loss=72.567, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 488.51it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=258.430, player_2/loss=62.978, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 493.45it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=248.621, player_2/loss=81.589, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 491.26it/s, env_step=1024, len=16, n/ep=5, n/st=64, player_1/loss=353.899, player_2/loss=120.315, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.54it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=296.741, player_2/loss=128.734, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 493.41it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=168.066, player_2/loss=147.425, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.85it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=129.774, player_2/loss=206.920, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.79it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=200.068, player_2/loss=211.335, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 498.77it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=199.217, player_2/loss=220.557, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 492.28it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=184.518, rew=15.00]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.69it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=133.498, player_2/loss=261.187, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 486.83it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=62.582, player_2/loss=302.859, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 494.76it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=36.262, player_2/loss=276.434, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 490.58it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=48.366, player_2/loss=255.740, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 495.46it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=84.797, player_2/loss=284.567, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 495.08it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=121.994, player_2/loss=296.954, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 485.72it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=63.403, player_2/loss=281.622, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 495.06it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=20.571, player_2/loss=262.219, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 492.59it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=26.387, player_2/loss=280.276, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 492.41it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=25.713, player_2/loss=269.659, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 489.13it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=9.282, rew=8.33]          


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 485.99it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=21.939, player_2/loss=330.777, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 494.85it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=20.432, rew=-25.00]          


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 485.88it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=63.496, rew=15.00]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 486.68it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=181.342, player_2/loss=181.948, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 492.43it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=259.395, player_2/loss=131.316, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 478.03it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=231.292, player_2/loss=66.559, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 490.30it/s, env_step=6144, len=8, n/ep=9, n/st=64, player_1/loss=219.134, player_2/loss=50.108, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.14it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=198.011, player_2/loss=58.206, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 490.59it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=230.408, player_2/loss=44.065, rew=18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 494.51it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=257.726, player_2/loss=71.115, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 483.65it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=252.768, player_2/loss=55.249, rew=6.25]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 497.33it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=235.189, player_2/loss=40.420, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 494.58it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=257.183, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 496.05it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=254.278, player_2/loss=99.439, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.84it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=257.464, player_2/loss=82.802, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 485.44it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=263.014, player_2/loss=134.725, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 495.57it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=238.420, player_2/loss=110.286, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 489.71it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=233.164, player_2/loss=37.045, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 494.05it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=281.574, player_2/loss=64.867, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 494.92it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=263.338, player_2/loss=82.681, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 483.47it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=177.058, player_2/loss=46.820, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.66it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=155.066, player_2/loss=48.974, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.92it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=153.165, player_2/loss=171.100, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.07it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=131.388, player_2/loss=348.392, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 492.77it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=77.117, player_2/loss=354.000, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 486.80it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=61.038, player_2/loss=361.578, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 496.08it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=42.874, player_2/loss=424.230, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 492.86it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=30.212, player_2/loss=377.241, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 494.29it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=43.126, player_2/loss=373.438, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 490.70it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=33.426, player_2/loss=394.930, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 489.06it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=10.922, player_2/loss=415.904, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 491.92it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=51.244, player_2/loss=380.609, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 491.53it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=55.757, player_2/loss=419.518, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 490.39it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=12.119, player_2/loss=400.017, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 496.48it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=10.552, player_2/loss=370.149, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 483.82it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=7.282, player_2/loss=404.557, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 489.81it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=29.044, player_2/loss=401.420, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 490.81it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=15.510, player_2/loss=390.375, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 489.04it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=47.042, player_2/loss=362.971, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 489.29it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=11.001, player_2/loss=300.492, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 487.84it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=30.275, player_2/loss=235.660, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.13it/s, env_step=3072, len=20, n/ep=4, n/st=64, player_1/loss=170.035, player_2/loss=169.826, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 495.76it/s, env_step=4096, len=26, n/ep=2, n/st=64, player_1/loss=303.059, player_2/loss=126.445, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 492.60it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=248.911, player_2/loss=89.493, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 493.19it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=216.256, player_2/loss=105.563, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 483.17it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=206.703, player_2/loss=91.420, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 496.04it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=167.569, player_2/loss=83.527, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 489.33it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=180.320, player_2/loss=83.919, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 497.30it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=215.262, player_2/loss=86.808, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 496.13it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=230.714, player_2/loss=38.640, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 483.09it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=188.633, player_2/loss=42.776, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 458.95it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=173.308, player_2/loss=111.709, rew=-12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 491.53it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=166.664, player_2/loss=121.324, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 493.93it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=202.979, player_2/loss=61.754, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 490.18it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=187.815, player_2/loss=73.496, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 480.78it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=146.088, player_2/loss=136.694, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 494.83it/s, env_step=18432, len=29, n/ep=2, n/st=64, player_2/loss=129.709, rew=-25.00]      


Epoch #18: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #18


Epoch #19: 1025it [00:02, 497.95it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=105.107, player_2/loss=68.521, rew=-8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #18


Epoch #1: 1025it [00:02, 496.15it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=107.153, player_2/loss=78.602, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.16it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=101.446, player_2/loss=129.954, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 482.11it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=66.683, player_2/loss=183.758, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.66it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=51.622, player_2/loss=194.763, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.07it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=77.045, player_2/loss=177.430, rew=16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 497.38it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=160.991, player_2/loss=178.632, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 495.64it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=163.440, player_2/loss=237.450, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 487.25it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=99.725, player_2/loss=244.013, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.96it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=48.870, player_2/loss=204.479, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.77it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=50.920, player_2/loss=245.693, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 493.96it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=56.660, player_2/loss=203.599, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 494.70it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=77.215, player_2/loss=208.732, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.46it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=91.624, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 483.77it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=50.446, player_2/loss=202.623, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.96it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=53.827, player_2/loss=226.917, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 490.45it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=45.037, player_2/loss=231.524, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 490.98it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=26.612, player_2/loss=259.313, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 496.29it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=59.273, player_2/loss=238.726, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.42it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=55.462, player_2/loss=194.481, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 496.05it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=69.518, player_2/loss=223.393, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.71it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=81.647, player_2/loss=208.443, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.41it/s, env_step=3072, len=28, n/ep=3, n/st=64, player_1/loss=124.298, player_2/loss=177.509, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 494.29it/s, env_step=4096, len=27, n/ep=2, n/st=64, player_1/loss=119.098, player_2/loss=153.658, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 489.02it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=141.723, player_2/loss=118.755, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 495.51it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=204.286, player_2/loss=105.278, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 499.50it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=227.002, player_2/loss=87.759, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 495.41it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=143.380, player_2/loss=46.284, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 498.56it/s, env_step=9216, len=25, n/ep=3, n/st=64, player_1/loss=124.682, player_2/loss=52.378, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 489.93it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=198.248, player_2/loss=33.534, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 496.61it/s, env_step=11264, len=21, n/ep=4, n/st=64, player_1/loss=230.031, player_2/loss=13.142, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 500.74it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=273.189, player_2/loss=19.089, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 497.91it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=225.733, player_2/loss=23.712, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 497.41it/s, env_step=14336, len=24, n/ep=2, n/st=64, player_1/loss=108.931, player_2/loss=36.343, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 487.14it/s, env_step=15360, len=28, n/ep=2, n/st=64, player_1/loss=165.580, player_2/loss=52.203, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 499.43it/s, env_step=16384, len=22, n/ep=3, n/st=64, player_1/loss=187.940, player_2/loss=74.506, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 491.36it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_2/loss=135.557, rew=-15.00]      


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 496.11it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=165.619, player_2/loss=167.404, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 486.94it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=181.425, player_2/loss=166.409, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 490.04it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=230.374, player_2/loss=590.307, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.19it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=170.321, player_2/loss=588.826, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.16it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=90.728, player_2/loss=553.689, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.07it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=70.785, player_2/loss=527.982, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 492.53it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=72.054, player_2/loss=512.150, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 487.36it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=94.214, player_2/loss=512.778, rew=6.25]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.91it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=115.821, player_2/loss=510.693, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.21it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=94.073, player_2/loss=490.375, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.79it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=73.100, player_2/loss=584.816, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 485.83it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=45.065, player_2/loss=540.498, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 485.13it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=48.239, player_2/loss=569.750, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 492.77it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=66.505, player_2/loss=575.834, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 494.61it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=60.247, player_2/loss=521.949, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 491.64it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=70.376, player_2/loss=559.209, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 499.31it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=71.677, player_2/loss=497.471, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 486.01it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=58.014, player_2/loss=503.969, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.38it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=65.025, player_2/loss=555.634, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 494.78it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=76.223, player_2/loss=581.793, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 493.88it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=29.552, player_2/loss=542.622, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 496.33it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=22.475, player_2/loss=352.814, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.80it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=106.501, player_2/loss=315.060, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.25it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=159.679, player_2/loss=229.774, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 489.89it/s, env_step=4096, len=22, n/ep=2, n/st=64, player_1/loss=123.978, player_2/loss=203.898, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 496.76it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=90.048, player_2/loss=147.606, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 496.44it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=86.863, player_2/loss=93.647, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 489.52it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=84.633, player_2/loss=94.111, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 497.09it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=102.898, player_2/loss=87.718, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 500.18it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=115.681, player_2/loss=25.940, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 494.97it/s, env_step=10240, len=16, n/ep=3, n/st=64, player_1/loss=88.581, player_2/loss=27.326, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 498.24it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=90.741, player_2/loss=46.842, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 484.09it/s, env_step=12288, len=16, n/ep=3, n/st=64, player_1/loss=78.157, player_2/loss=57.201, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 496.00it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=67.362, player_2/loss=36.603, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 496.35it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=62.999, player_2/loss=20.218, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 496.34it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=65.171, player_2/loss=14.968, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 495.35it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=78.573, player_2/loss=12.593, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 487.48it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=92.764, player_2/loss=31.741, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 492.80it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_1/loss=98.743, player_2/loss=31.349, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 497.23it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=97.201, player_2/loss=29.249, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 492.73it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=75.646, player_2/loss=80.461, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.81it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=65.723, player_2/loss=44.203, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 484.89it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=44.300, player_2/loss=10.119, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 489.26it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=31.614, player_2/loss=7.470, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.36it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=60.937, player_2/loss=74.221, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 492.53it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=73.556, player_2/loss=81.185, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.57it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=45.851, player_2/loss=30.737, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 481.68it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=29.570, player_2/loss=29.387, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 487.66it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=66.316, player_2/loss=132.950, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 488.49it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=95.533, player_2/loss=295.886, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 477.74it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=147.485, player_2/loss=326.820, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.18it/s, env_step=12288, len=8, n/ep=6, n/st=64, player_1/loss=124.635, player_2/loss=334.219, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 481.37it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=102.529, player_2/loss=366.690, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 497.25it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=91.933, player_2/loss=339.818, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.16it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=64.030, player_2/loss=391.662, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.59it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=58.585, player_2/loss=431.171, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.18it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=58.651, player_2/loss=418.162, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 474.92it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=18.030, player_2/loss=356.040, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.09it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=9.708, player_2/loss=365.540, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 493.50it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=25.442, player_2/loss=337.300, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 500.28it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=96.736, player_2/loss=199.911, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.97it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=133.444, player_2/loss=119.585, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 490.59it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=144.047, player_2/loss=71.219, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 496.43it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=106.808, player_2/loss=53.924, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 496.27it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=98.491, player_2/loss=58.717, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.84it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=119.812, player_2/loss=85.913, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.22it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=123.777, player_2/loss=116.068, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 492.73it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=139.790, player_2/loss=133.978, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 497.95it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=138.254, player_2/loss=111.216, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 497.43it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=149.010, player_2/loss=80.590, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.98it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=150.116, player_2/loss=122.265, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 501.36it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=140.754, player_2/loss=153.923, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 492.88it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=177.001, player_2/loss=109.948, rew=15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.59it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=259.041, player_2/loss=67.855, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 500.65it/s, env_step=16384, len=15, n/ep=3, n/st=64, player_1/loss=285.857, player_2/loss=65.796, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 497.12it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=338.507, player_2/loss=123.609, rew=12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 495.44it/s, env_step=18432, len=10, n/ep=5, n/st=64, player_1/loss=496.890, player_2/loss=131.567, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 501.94it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=339.199, player_2/loss=156.861, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 490.27it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=88.958, player_2/loss=105.231, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 500.14it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=103.245, player_2/loss=109.610, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.97it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=113.773, player_2/loss=101.127, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.24it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=118.694, player_2/loss=124.258, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.01it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=135.719, player_2/loss=166.117, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.93it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=129.861, rew=25.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.87it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=94.040, player_2/loss=226.641, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 499.01it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=86.989, player_2/loss=243.921, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 495.41it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=98.226, player_2/loss=204.286, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 496.61it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=119.455, player_2/loss=218.142, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 485.68it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=149.053, player_2/loss=234.502, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.03it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=155.117, player_2/loss=211.444, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.03it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=101.497, player_2/loss=254.934, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 494.46it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=45.026, player_2/loss=280.896, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 497.03it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=35.788, player_2/loss=280.179, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 486.86it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=16.681, player_2/loss=280.956, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 497.17it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=36.354, player_2/loss=270.886, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 498.02it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=49.569, player_2/loss=227.148, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 499.51it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=36.841, player_2/loss=236.856, rew=16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 492.03it/s, env_step=1024, len=9, n/ep=8, n/st=64, player_1/loss=327.492, player_2/loss=241.204, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 487.22it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=611.271, player_2/loss=217.767, rew=10.71]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 496.85it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=687.840, player_2/loss=140.509, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.15it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=607.002, player_2/loss=62.250, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 495.44it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=653.661, player_2/loss=84.636, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 495.30it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=529.250, player_2/loss=183.958, rew=17.86]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 489.06it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=471.256, player_2/loss=182.095, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 498.57it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=528.736, player_2/loss=129.401, rew=18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.67it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=603.863, player_2/loss=90.482, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 497.58it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=665.311, player_2/loss=67.726, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 495.34it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=546.346, player_2/loss=99.983, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 484.25it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=473.420, player_2/loss=108.283, rew=18.75]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.40it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=540.431, player_2/loss=32.468, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.34it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=558.231, player_2/loss=64.889, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 498.44it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=579.345, player_2/loss=108.859, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 494.54it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=626.842, player_2/loss=81.123, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.25it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=578.159, player_2/loss=40.186, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.48it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=546.138, player_2/loss=87.988, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 497.56it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=629.242, player_2/loss=119.691, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 493.02it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=412.450, player_2/loss=21.258, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.23it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=323.395, player_2/loss=65.943, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.54it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=228.245, player_2/loss=116.881, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 492.26it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=200.453, player_2/loss=287.525, rew=17.86]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 496.49it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=122.896, player_2/loss=493.937, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 495.73it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=98.951, player_2/loss=589.670, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 492.51it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=113.321, player_2/loss=667.016, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 496.78it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=188.018, player_2/loss=706.842, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 488.18it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=156.941, player_2/loss=818.744, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 498.41it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=107.927, player_2/loss=781.656, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 493.11it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=116.852, player_2/loss=675.124, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 493.16it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=89.419, player_2/loss=591.131, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 494.94it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=49.286, player_2/loss=787.020, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 484.22it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=50.365, player_2/loss=849.064, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 496.54it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=49.491, player_2/loss=779.806, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 494.17it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=95.412, player_2/loss=663.040, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 494.03it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=101.599, player_2/loss=546.799, rew=-3.57]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 498.97it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=29.825, player_2/loss=691.983, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 486.30it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=18.132, player_2/loss=668.541, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 495.06it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=39.631, player_2/loss=540.926, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 492.84it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=104.110, player_2/loss=427.247, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.81it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=174.224, player_2/loss=297.342, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 495.95it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=119.979, player_2/loss=262.121, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 483.33it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=70.157, player_2/loss=246.693, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 491.67it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=125.278, player_2/loss=241.624, rew=-13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 496.46it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=97.627, player_2/loss=252.780, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.91it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=116.866, player_2/loss=228.520, rew=-19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 497.13it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=148.537, player_2/loss=199.292, rew=-18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 482.52it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=106.761, player_2/loss=155.186, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 493.59it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=109.674, player_2/loss=140.198, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.63it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=163.621, player_2/loss=155.995, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 496.59it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=219.931, player_2/loss=113.528, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 498.36it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=314.855, player_2/loss=82.266, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #15: 1025it [00:02, 482.65it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=384.324, player_2/loss=71.110, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #16: 1025it [00:02, 499.27it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=377.755, player_2/loss=31.681, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #17: 1025it [00:02, 491.06it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=300.330, player_2/loss=21.735, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #18: 1025it [00:02, 497.45it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=487.904, player_2/loss=14.050, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #19: 1025it [00:02, 495.44it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=494.438, player_2/loss=7.508, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #1: 1025it [00:02, 482.46it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=353.233, player_2/loss=37.409, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.00it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=208.897, player_2/loss=52.418, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.69it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=96.649, player_2/loss=69.919, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.86it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=82.216, player_2/loss=64.335, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 495.28it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=101.792, player_2/loss=106.467, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 484.90it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=98.925, player_2/loss=182.287, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 497.63it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=58.348, player_2/loss=341.186, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 498.07it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=70.026, player_2/loss=383.773, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 494.43it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=104.646, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 493.23it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=79.208, player_2/loss=353.845, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 484.60it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=43.558, player_2/loss=397.165, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 489.78it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=56.010, player_2/loss=439.120, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 493.69it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=95.596, player_2/loss=436.967, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 494.41it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=109.105, player_2/loss=352.529, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 492.39it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=51.956, player_2/loss=353.828, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 493.41it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=65.047, player_2/loss=344.217, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 488.25it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=94.063, player_2/loss=319.283, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 492.71it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=43.114, player_2/loss=380.740, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 492.56it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=83.835, player_2/loss=448.212, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 496.05it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=87.284, player_2/loss=253.056, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.75it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=106.806, rew=-19.44]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 484.50it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=88.103, player_2/loss=262.259, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.23it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=77.689, player_2/loss=219.660, rew=-10.71]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.91it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=84.279, player_2/loss=165.391, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.38it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=73.612, player_2/loss=152.495, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.78it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=63.477, player_2/loss=161.395, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.39it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=100.001, player_2/loss=179.777, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 501.59it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=89.652, player_2/loss=152.647, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 498.31it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=84.039, player_2/loss=102.312, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 494.87it/s, env_step=11264, len=22, n/ep=2, n/st=64, player_1/loss=111.765, player_2/loss=113.317, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 494.70it/s, env_step=12288, len=16, n/ep=5, n/st=64, player_1/loss=110.463, player_2/loss=87.375, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 486.97it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=106.178, player_2/loss=74.928, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 497.38it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=120.490, player_2/loss=86.172, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 496.53it/s, env_step=15360, len=15, n/ep=5, n/st=64, player_1/loss=103.263, player_2/loss=85.880, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 494.18it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=91.121, player_2/loss=90.464, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 494.39it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=119.250, player_2/loss=172.444, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 486.40it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=138.928, player_2/loss=245.998, rew=-19.44]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 493.45it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=163.602, player_2/loss=231.717, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 491.32it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=73.564, player_2/loss=236.682, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 474.40it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=61.073, rew=19.44]           


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 482.30it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=41.223, player_2/loss=162.945, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 488.75it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=30.811, player_2/loss=141.492, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.39it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=43.596, player_2/loss=129.396, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 496.76it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=32.942, player_2/loss=113.547, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 492.08it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=9.043, player_2/loss=105.413, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 498.76it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=42.396, player_2/loss=113.623, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 484.18it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=44.433, rew=25.00]           


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 491.68it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=14.313, rew=25.00]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 491.52it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=46.889, player_2/loss=117.746, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 493.68it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=69.270, player_2/loss=132.487, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 496.86it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=59.386, player_2/loss=109.810, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.67it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=24.660, player_2/loss=100.267, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 494.18it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=14.502, player_2/loss=103.689, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.77it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=13.223, player_2/loss=101.375, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.62it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=43.337, player_2/loss=113.203, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.67it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=47.529, player_2/loss=126.166, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 487.09it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=7.525, player_2/loss=150.919, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 494.79it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=51.438, player_2/loss=166.967, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.32it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=29.543, player_2/loss=132.353, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.50it/s, env_step=3072, len=7, n/ep=7, n/st=64, player_1/loss=32.317, player_2/loss=119.639, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.05it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=31.778, player_2/loss=115.181, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 488.94it/s, env_step=5120, len=25, n/ep=3, n/st=64, player_1/loss=60.435, player_2/loss=88.911, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.16it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=113.510, player_2/loss=53.540, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 500.39it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=148.154, player_2/loss=96.623, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.18it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=109.351, player_2/loss=115.326, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 501.40it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=57.433, player_2/loss=109.530, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 487.56it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=94.265, player_2/loss=65.788, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 500.14it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=88.996, player_2/loss=31.954, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 499.85it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=80.931, player_2/loss=29.309, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 499.52it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=119.194, player_2/loss=37.021, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 498.43it/s, env_step=14336, len=26, n/ep=3, n/st=64, player_1/loss=174.075, player_2/loss=40.163, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 486.47it/s, env_step=15360, len=24, n/ep=3, n/st=64, player_1/loss=169.569, player_2/loss=56.359, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 500.00it/s, env_step=16384, len=20, n/ep=4, n/st=64, player_1/loss=137.924, player_2/loss=49.562, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 502.98it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=173.511, player_2/loss=23.577, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 499.45it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=193.214, player_2/loss=60.518, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.20it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=137.437, player_2/loss=60.579, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 485.51it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=90.843, player_2/loss=29.576, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.86it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=108.516, player_2/loss=58.085, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.47it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=143.718, player_2/loss=93.961, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 500.76it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=122.990, player_2/loss=119.293, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.42it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=95.272, player_2/loss=138.427, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 489.56it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=89.854, player_2/loss=99.976, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 498.83it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=59.600, player_2/loss=38.887, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.67it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=36.386, player_2/loss=24.059, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.81it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=28.864, player_2/loss=20.858, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 498.06it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=14.271, player_2/loss=26.926, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 494.21it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=23.308, player_2/loss=32.940, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 498.22it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=77.703, player_2/loss=41.862, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.47it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=73.915, player_2/loss=46.637, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.25it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=17.541, player_2/loss=50.109, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 495.34it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=17.037, player_2/loss=30.647, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 498.81it/s, env_step=16384, len=15, n/ep=5, n/st=64, player_1/loss=16.987, player_2/loss=32.372, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 493.37it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=10.581, player_2/loss=26.758, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 498.62it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=10.338, player_2/loss=24.810, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 497.08it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=15.392, player_2/loss=38.407, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 495.62it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=15.672, player_2/loss=48.109, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.84it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=17.364, player_2/loss=27.882, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 491.69it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=17.038, player_2/loss=10.910, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 500.89it/s, env_step=4096, len=15, n/ep=3, n/st=64, player_1/loss=26.061, player_2/loss=27.114, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 498.46it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=22.452, player_2/loss=23.423, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 501.25it/s, env_step=6144, len=25, n/ep=2, n/st=64, player_1/loss=73.042, player_2/loss=40.361, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 495.48it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_2/loss=100.265, rew=-8.33]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 485.46it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=121.743, player_2/loss=104.892, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 500.56it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=71.329, player_2/loss=88.941, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 496.79it/s, env_step=10240, len=20, n/ep=2, n/st=64, player_2/loss=94.359, rew=0.00]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 497.04it/s, env_step=11264, len=16, n/ep=5, n/st=64, player_1/loss=169.329, player_2/loss=122.235, rew=-15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 500.88it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=171.348, player_2/loss=98.451, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 492.37it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=140.212, player_2/loss=67.663, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 501.14it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=201.901, player_2/loss=54.736, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 496.24it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=201.053, player_2/loss=42.052, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 500.47it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=162.222, player_2/loss=49.728, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 498.14it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=200.874, player_2/loss=138.180, rew=-5.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 499.42it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=202.209, player_2/loss=165.691, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 488.02it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=180.565, player_2/loss=123.302, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 498.07it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=64.396, player_2/loss=109.415, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.34it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=76.877, player_2/loss=117.228, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.87it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=79.296, player_2/loss=123.735, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.64it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=76.480, player_2/loss=141.091, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 487.61it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=97.027, player_2/loss=154.601, rew=-6.25]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.36it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=133.523, player_2/loss=189.783, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 491.79it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=134.326, player_2/loss=206.836, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.34it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=137.374, player_2/loss=216.992, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 496.37it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=110.384, player_2/loss=209.589, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 485.75it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=60.525, player_2/loss=201.845, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.55it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=44.536, player_2/loss=217.833, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.92it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=48.847, player_2/loss=259.509, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 493.27it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=84.953, player_2/loss=259.947, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 493.29it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=51.888, player_2/loss=260.810, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 483.75it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=18.191, player_2/loss=226.096, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 494.61it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=65.307, player_2/loss=234.771, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 496.14it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=76.420, player_2/loss=274.216, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 494.02it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=71.342, player_2/loss=248.130, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.23it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=74.800, player_2/loss=242.556, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 483.04it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=98.089, player_2/loss=264.251, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.89it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=100.268, rew=-19.44]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.99it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=49.760, player_2/loss=211.724, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 503.72it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=47.772, player_2/loss=173.502, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.74it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=31.837, player_2/loss=167.238, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 493.04it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=112.810, player_2/loss=161.807, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.44it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=214.261, player_2/loss=129.787, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 499.96it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=343.073, player_2/loss=90.581, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 498.56it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=416.777, player_2/loss=47.815, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 500.37it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=417.818, player_2/loss=28.064, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.34it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=357.152, player_2/loss=38.675, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 488.48it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=415.409, player_2/loss=27.450, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 478.54it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=478.661, rew=5.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.26it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=405.606, player_2/loss=17.820, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 497.71it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=367.591, player_2/loss=28.378, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 489.21it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=453.713, player_2/loss=50.716, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 500.29it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=451.745, player_2/loss=55.128, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.61it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=377.675, player_2/loss=32.470, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 500.25it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=407.461, player_2/loss=16.808, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 493.19it/s, env_step=1024, len=20, n/ep=2, n/st=64, player_1/loss=149.697, player_2/loss=160.333, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 483.70it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=49.179, player_2/loss=271.716, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 490.69it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=85.993, player_2/loss=270.226, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 495.04it/s, env_step=4096, len=25, n/ep=3, n/st=64, player_1/loss=114.975, player_2/loss=611.845, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 497.15it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=84.189, player_2/loss=547.751, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.27it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=64.720, player_2/loss=92.888, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 495.89it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=89.903, player_2/loss=127.779, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 483.93it/s, env_step=8192, len=21, n/ep=2, n/st=64, player_1/loss=86.060, player_2/loss=96.848, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.54it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=53.108, player_2/loss=110.912, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 495.08it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=63.525, player_2/loss=177.692, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 496.79it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=65.317, player_2/loss=184.669, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 493.17it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=110.293, player_2/loss=190.621, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 486.40it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=128.515, player_2/loss=210.039, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 495.52it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=123.543, player_2/loss=343.023, rew=-17.86]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 496.94it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=102.129, player_2/loss=393.921, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 494.55it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=59.078, player_2/loss=399.259, rew=17.86]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.36it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=88.484, player_2/loss=430.262, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 485.79it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=101.178, player_2/loss=469.078, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 495.21it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=94.837, player_2/loss=491.881, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.57it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=107.692, player_2/loss=309.866, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.62it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=103.696, player_2/loss=268.980, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.75it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=97.338, player_2/loss=202.213, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.16it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=100.564, player_2/loss=152.702, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 495.98it/s, env_step=5120, len=25, n/ep=3, n/st=64, player_1/loss=102.754, player_2/loss=143.014, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 494.77it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=126.150, player_2/loss=134.640, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 501.14it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=137.454, player_2/loss=155.676, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 488.29it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=106.106, player_2/loss=145.415, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 498.71it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=93.616, player_2/loss=128.576, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 493.46it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=66.111, player_2/loss=120.369, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 497.01it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=82.122, player_2/loss=75.980, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 486.08it/s, env_step=12288, len=35, n/ep=2, n/st=64, player_1/loss=122.131, player_2/loss=88.700, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 490.74it/s, env_step=13312, len=25, n/ep=2, n/st=64, player_1/loss=103.199, player_2/loss=130.402, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 496.98it/s, env_step=14336, len=28, n/ep=3, n/st=64, player_1/loss=83.934, player_2/loss=147.201, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 495.20it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=177.176, player_2/loss=149.102, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 494.91it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=195.400, player_2/loss=192.553, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 494.52it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=185.530, player_2/loss=210.113, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 493.98it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=249.825, player_2/loss=120.041, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 485.13it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=296.776, player_2/loss=77.043, rew=-13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 495.56it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=319.591, player_2/loss=96.666, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.99it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=255.639, player_2/loss=100.069, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.54it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=208.695, player_2/loss=92.965, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 502.45it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=178.911, player_2/loss=130.748, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 488.10it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=147.077, player_2/loss=105.862, rew=-16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.81it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=129.871, player_2/loss=84.511, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.36it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=113.480, player_2/loss=99.693, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.30it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=136.173, player_2/loss=193.928, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 500.27it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=114.534, player_2/loss=255.074, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 487.83it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=81.719, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 495.26it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=114.569, player_2/loss=238.862, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 493.26it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=72.174, player_2/loss=287.938, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 492.90it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=55.846, player_2/loss=287.375, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 495.34it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=51.453, player_2/loss=343.796, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 489.51it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=45.019, player_2/loss=342.739, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 497.61it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=24.654, player_2/loss=329.230, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 497.52it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=19.458, player_2/loss=387.153, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 497.56it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=14.371, player_2/loss=370.078, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 499.76it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=19.223, player_2/loss=359.742, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 490.33it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=69.029, player_2/loss=275.852, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 501.09it/s, env_step=2048, len=30, n/ep=2, n/st=64, player_1/loss=116.650, player_2/loss=253.014, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.78it/s, env_step=3072, len=20, n/ep=4, n/st=64, player_1/loss=103.931, player_2/loss=231.183, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.76it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=94.080, player_2/loss=194.794, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.00it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=94.249, player_2/loss=147.768, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.63it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=79.924, player_2/loss=128.165, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.81it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=93.939, player_2/loss=99.852, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 498.29it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=125.596, player_2/loss=99.755, rew=-5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 493.47it/s, env_step=9216, len=30, n/ep=3, n/st=64, player_1/loss=85.654, player_2/loss=96.525, rew=33.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 495.80it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=135.205, player_2/loss=102.413, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 488.82it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=155.623, player_2/loss=122.138, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 498.99it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=176.055, player_2/loss=138.876, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 495.30it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=197.342, player_2/loss=142.183, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 495.28it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=155.427, player_2/loss=103.139, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 502.90it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=132.853, player_2/loss=114.452, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 495.70it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=193.546, player_2/loss=133.623, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 488.04it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=205.263, player_2/loss=158.892, rew=15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 496.60it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=147.652, player_2/loss=119.412, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 498.30it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=115.767, player_2/loss=61.422, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 493.66it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=124.162, player_2/loss=121.106, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.98it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=90.401, player_2/loss=113.129, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 486.04it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=52.865, player_2/loss=152.410, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.04it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=105.421, player_2/loss=213.065, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 494.21it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=94.352, player_2/loss=245.346, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 493.98it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_2/loss=235.077, rew=25.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.98it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=53.365, player_2/loss=229.078, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 483.01it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=45.168, player_2/loss=214.971, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 496.86it/s, env_step=9216, len=7, n/ep=7, n/st=64, player_1/loss=28.058, player_2/loss=228.026, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 494.69it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=18.197, player_2/loss=242.515, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.66it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=19.891, player_2/loss=268.245, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 494.79it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=15.900, player_2/loss=245.411, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 483.73it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=23.283, player_2/loss=262.515, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 493.94it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=21.320, player_2/loss=230.727, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 496.62it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=4.675, player_2/loss=208.508, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 497.72it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=2.948, player_2/loss=238.151, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 498.87it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=8.803, player_2/loss=214.732, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 480.99it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=25.003, player_2/loss=230.104, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 493.01it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=12.940, player_2/loss=235.460, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.78it/s, env_step=1024, len=12, n/ep=4, n/st=64, player_1/loss=134.862, player_2/loss=218.839, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 501.10it/s, env_step=2048, len=28, n/ep=3, n/st=64, player_1/loss=132.103, player_2/loss=221.119, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 500.28it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=124.441, player_2/loss=134.330, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.69it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=111.194, player_2/loss=66.210, rew=-16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.95it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=135.390, player_2/loss=115.579, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.09it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=143.969, player_2/loss=139.993, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 499.73it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=91.024, player_2/loss=123.299, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 499.41it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=59.338, player_2/loss=90.374, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 487.98it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=79.710, player_2/loss=100.839, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 500.08it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=109.468, player_2/loss=147.108, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 498.74it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=146.239, player_2/loss=132.803, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.91it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=167.967, player_2/loss=69.784, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.57it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=105.307, player_2/loss=68.792, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 492.94it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=96.709, player_2/loss=111.149, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 500.77it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=186.655, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 500.34it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=158.131, player_2/loss=130.504, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 498.80it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=118.707, player_2/loss=70.762, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 494.24it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=156.450, player_2/loss=51.183, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 491.92it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=149.741, player_2/loss=63.289, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 498.03it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=135.809, player_2/loss=125.769, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.88it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=100.523, rew=-8.33]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 498.43it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=83.757, player_2/loss=61.269, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.09it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=79.919, player_2/loss=85.592, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 499.47it/s, env_step=5120, len=30, n/ep=2, n/st=64, player_1/loss=88.878, player_2/loss=95.108, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 499.71it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=84.272, player_2/loss=108.135, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 494.93it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=158.434, player_2/loss=128.782, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 498.46it/s, env_step=8192, len=28, n/ep=2, n/st=64, player_1/loss=187.927, player_2/loss=103.077, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 499.49it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=115.369, player_2/loss=104.868, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 493.22it/s, env_step=10240, len=22, n/ep=4, n/st=64, player_1/loss=91.128, player_2/loss=72.253, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 498.51it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=101.921, player_2/loss=87.425, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 501.25it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=117.112, player_2/loss=109.178, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 496.30it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=82.024, player_2/loss=110.653, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 497.73it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=104.487, player_2/loss=136.607, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 486.69it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=76.699, player_2/loss=177.372, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 497.10it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=40.920, player_2/loss=202.917, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 495.97it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=38.256, player_2/loss=214.461, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 502.84it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=67.817, player_2/loss=252.896, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 497.77it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=67.246, player_2/loss=240.347, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 487.02it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=104.978, player_2/loss=186.522, rew=-10.71]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.80it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=85.126, player_2/loss=151.958, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 498.80it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=51.192, player_2/loss=123.124, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.34it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=46.473, player_2/loss=121.561, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.96it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=146.070, player_2/loss=160.662, rew=10.71]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.01it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=247.570, player_2/loss=201.930, rew=-19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 488.79it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=225.670, player_2/loss=174.534, rew=10.71]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 494.69it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=240.607, player_2/loss=115.020, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 498.66it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=262.064, player_2/loss=87.036, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 496.71it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=266.568, rew=16.67]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 498.32it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=317.981, player_2/loss=30.291, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 487.50it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=323.239, player_2/loss=23.696, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 501.96it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=329.215, player_2/loss=104.438, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 496.53it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=328.039, player_2/loss=105.813, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 498.50it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=337.061, player_2/loss=23.818, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 498.41it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=302.745, player_2/loss=28.278, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 483.07it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=264.845, player_2/loss=22.924, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 500.43it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=326.731, player_2/loss=10.188, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 493.92it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=344.118, player_2/loss=8.425, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 493.54it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=206.643, player_2/loss=10.153, rew=-16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.40it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=206.950, player_2/loss=130.939, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 482.76it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=161.912, player_2/loss=305.385, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.04it/s, env_step=4096, len=12, n/ep=7, n/st=64, player_1/loss=175.382, player_2/loss=497.721, rew=-10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 501.39it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=187.058, player_2/loss=594.101, rew=17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 494.83it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=131.231, player_2/loss=563.727, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.84it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=232.854, player_2/loss=358.117, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.48it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=270.452, player_2/loss=197.219, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 496.14it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=186.457, player_2/loss=370.224, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 493.64it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=151.716, player_2/loss=603.029, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 494.66it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=62.914, player_2/loss=624.273, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 494.99it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=90.485, player_2/loss=536.940, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.44it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=84.395, player_2/loss=521.958, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 482.94it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=51.345, player_2/loss=665.715, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 495.47it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=75.712, player_2/loss=668.586, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 493.53it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=105.930, player_2/loss=609.635, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 496.63it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=73.157, player_2/loss=455.321, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.15it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=64.531, player_2/loss=590.234, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 485.05it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=106.800, player_2/loss=663.113, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 493.74it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=148.711, player_2/loss=352.596, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 497.00it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=214.334, player_2/loss=180.488, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 482.23it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=345.631, player_2/loss=64.049, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.93it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=344.623, player_2/loss=65.176, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 484.62it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=304.452, player_2/loss=74.260, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.92it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=340.325, player_2/loss=45.283, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.49it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=306.979, player_2/loss=23.107, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.50it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=348.982, player_2/loss=48.028, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 496.74it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=443.248, player_2/loss=63.523, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 486.02it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=438.028, player_2/loss=28.710, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 494.03it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=439.443, player_2/loss=7.266, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 498.32it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=447.315, player_2/loss=6.442, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.01it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_2/loss=8.228, rew=25.00]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.34it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=550.052, player_2/loss=60.906, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 497.76it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=469.543, player_2/loss=64.476, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 486.96it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=487.424, player_2/loss=41.875, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 498.94it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=480.065, player_2/loss=7.247, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.23it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=380.301, player_2/loss=6.554, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 494.02it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=398.102, player_2/loss=6.142, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.87it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=216.253, player_2/loss=2.042, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.75it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=212.616, player_2/loss=37.134, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 500.65it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=190.094, player_2/loss=171.303, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.20it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=188.940, player_2/loss=221.770, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.78it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=147.934, player_2/loss=270.934, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 497.21it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=132.860, player_2/loss=263.268, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 486.90it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=189.568, player_2/loss=311.770, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 497.83it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=176.014, player_2/loss=392.560, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 497.07it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=181.952, player_2/loss=395.663, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 496.91it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=179.730, player_2/loss=248.939, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 500.19it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=117.479, player_2/loss=156.768, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 486.25it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=94.115, player_2/loss=269.955, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 493.81it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=58.569, player_2/loss=336.969, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 495.64it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=62.979, player_2/loss=309.087, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 495.37it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=53.914, player_2/loss=369.292, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 500.23it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=28.031, player_2/loss=497.836, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 490.27it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=14.148, player_2/loss=501.695, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 493.73it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=14.760, player_2/loss=396.096, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 491.80it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=6.733, player_2/loss=306.469, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 491.32it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=332.993, player_2/loss=147.089, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.72it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=383.315, player_2/loss=122.856, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 495.33it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=430.692, player_2/loss=101.423, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.08it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=402.434, player_2/loss=85.516, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 498.47it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=361.178, player_2/loss=119.981, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 495.50it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=382.687, player_2/loss=105.393, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.14it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=373.547, player_2/loss=62.131, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 496.12it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=455.725, player_2/loss=37.618, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 489.55it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=444.576, player_2/loss=92.002, rew=18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 496.62it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=411.893, player_2/loss=102.604, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 495.65it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=512.792, player_2/loss=55.109, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.77it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=455.805, player_2/loss=91.333, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 493.24it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=427.371, player_2/loss=100.052, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.02it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=405.620, player_2/loss=57.253, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 494.73it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=423.546, player_2/loss=34.056, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 492.20it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=424.224, player_2/loss=13.013, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 496.33it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=333.692, player_2/loss=16.623, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.68it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=380.756, player_2/loss=8.503, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 495.83it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=454.603, player_2/loss=27.273, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 490.70it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=261.959, player_2/loss=78.592, rew=-10.71]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.19it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=166.354, player_2/loss=59.976, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.67it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=191.431, player_2/loss=128.090, rew=18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.33it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=174.837, player_2/loss=373.265, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.96it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=102.690, player_2/loss=555.948, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 480.54it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=66.679, player_2/loss=610.587, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 488.81it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=58.159, player_2/loss=625.567, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.04it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=130.955, player_2/loss=560.889, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 493.48it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=114.176, player_2/loss=505.469, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 490.57it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=55.424, player_2/loss=473.162, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 483.77it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_2/loss=486.522, rew=13.89]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.49it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=104.047, player_2/loss=494.091, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 495.11it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=87.392, player_2/loss=378.833, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 497.37it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=59.180, player_2/loss=419.025, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 492.25it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=62.121, player_2/loss=418.505, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 488.97it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=36.488, player_2/loss=474.059, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 494.78it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=45.759, player_2/loss=394.214, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.05it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=57.203, player_2/loss=354.711, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.38it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=38.147, player_2/loss=410.769, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 490.33it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=56.882, player_2/loss=415.960, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.19it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=47.368, player_2/loss=357.653, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.73it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=53.864, player_2/loss=278.826, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.97it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=94.112, player_2/loss=212.913, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 495.42it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=106.577, player_2/loss=167.318, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.88it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=97.448, player_2/loss=112.564, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.21it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=92.023, player_2/loss=103.249, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 485.22it/s, env_step=8192, len=24, n/ep=3, n/st=64, player_1/loss=135.754, player_2/loss=86.981, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 495.81it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=156.964, player_2/loss=66.698, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 496.22it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=148.363, player_2/loss=34.585, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 495.67it/s, env_step=11264, len=22, n/ep=2, n/st=64, player_1/loss=184.037, player_2/loss=24.315, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 495.02it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=164.902, player_2/loss=59.273, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 488.12it/s, env_step=13312, len=25, n/ep=3, n/st=64, player_1/loss=183.080, player_2/loss=75.710, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 495.47it/s, env_step=14336, len=26, n/ep=3, n/st=64, player_1/loss=199.223, player_2/loss=65.068, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 493.72it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=166.355, player_2/loss=41.243, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 492.94it/s, env_step=16384, len=23, n/ep=2, n/st=64, player_1/loss=197.850, player_2/loss=49.579, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 495.99it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=162.946, player_2/loss=36.689, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 488.46it/s, env_step=18432, len=27, n/ep=2, n/st=64, player_1/loss=124.026, player_2/loss=47.764, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 497.10it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=153.828, player_2/loss=75.990, rew=-5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 492.07it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=100.950, player_2/loss=195.195, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 492.37it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=74.355, player_2/loss=176.052, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 496.14it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=81.142, player_2/loss=177.580, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.43it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=122.569, player_2/loss=224.799, rew=16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.59it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=60.410, player_2/loss=220.617, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.06it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=80.378, player_2/loss=199.122, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 496.40it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=94.797, rew=25.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.30it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=86.074, player_2/loss=212.154, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 497.30it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=119.637, player_2/loss=181.286, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 485.34it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=133.592, rew=25.00]       


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 494.29it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=120.788, player_2/loss=257.998, rew=16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 497.75it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=68.457, player_2/loss=287.249, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.10it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=31.516, player_2/loss=298.202, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.40it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=42.050, player_2/loss=262.950, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 487.44it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=46.174, player_2/loss=216.668, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 494.17it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=75.307, player_2/loss=198.109, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 491.84it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=66.799, player_2/loss=226.084, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.56it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=36.434, player_2/loss=233.258, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.92it/s, env_step=19456, len=9, n/ep=8, n/st=64, player_1/loss=17.773, player_2/loss=232.748, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 484.65it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=107.169, player_2/loss=206.516, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 497.12it/s, env_step=2048, len=20, n/ep=2, n/st=64, player_1/loss=163.875, player_2/loss=200.400, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 501.00it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=189.105, player_2/loss=161.953, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 495.62it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=175.075, player_2/loss=57.980, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 494.89it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=186.402, player_2/loss=128.082, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.96it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=216.627, player_2/loss=116.311, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 491.55it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=216.176, player_2/loss=48.759, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 500.51it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=193.711, player_2/loss=38.676, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.20it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_2/loss=82.033, rew=8.33]           


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 496.13it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=202.880, player_2/loss=89.639, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 496.70it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=193.009, player_2/loss=79.101, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.52it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=176.325, player_2/loss=48.103, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 495.70it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=162.017, player_2/loss=42.369, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 495.36it/s, env_step=14336, len=17, n/ep=3, n/st=64, player_1/loss=193.102, player_2/loss=21.578, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 498.70it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=176.434, player_2/loss=20.970, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.85it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=155.228, player_2/loss=19.515, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.41it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=150.536, player_2/loss=17.565, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.23it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=162.088, player_2/loss=6.968, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 496.38it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=198.216, player_2/loss=8.392, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.69it/s, env_step=1024, len=22, n/ep=2, n/st=64, player_1/loss=119.203, player_2/loss=69.973, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.87it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=100.182, player_2/loss=41.362, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.74it/s, env_step=3072, len=16, n/ep=3, n/st=64, player_1/loss=83.572, player_2/loss=9.282, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 487.66it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=75.038, player_2/loss=5.090, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.55it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=43.144, player_2/loss=8.137, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.56it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=49.383, player_2/loss=13.773, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.28it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_2/loss=22.174, rew=-25.00]         


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 498.38it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=60.102, player_2/loss=46.584, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.49it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=48.767, player_2/loss=34.573, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 497.31it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=70.954, player_2/loss=41.140, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 497.37it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=109.751, player_2/loss=80.666, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 499.54it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=165.944, player_2/loss=186.131, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 485.58it/s, env_step=13312, len=23, n/ep=2, n/st=64, player_1/loss=135.635, player_2/loss=258.643, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 488.28it/s, env_step=14336, len=25, n/ep=2, n/st=64, player_1/loss=85.536, player_2/loss=208.634, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #15: 1025it [00:02, 498.63it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=100.281, rew=-12.50]      


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #16: 1025it [00:02, 491.85it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=92.547, player_2/loss=112.533, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #17: 1025it [00:02, 496.10it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=83.159, player_2/loss=118.170, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #18: 1025it [00:02, 494.92it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=106.764, player_2/loss=203.973, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #19: 1025it [00:02, 496.39it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=123.685, player_2/loss=256.875, rew=-15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #1: 1025it [00:02, 486.94it/s, env_step=1024, len=22, n/ep=2, n/st=64, player_1/loss=86.181, player_2/loss=122.933, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.90it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=74.134, player_2/loss=133.899, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.05it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=95.216, player_2/loss=149.867, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.48it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=134.400, player_2/loss=163.708, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 506.69it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=122.425, player_2/loss=152.377, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.35it/s, env_step=6144, len=29, n/ep=2, n/st=64, player_1/loss=115.705, player_2/loss=75.641, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 498.96it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=118.958, player_2/loss=76.102, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.60it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=148.200, player_2/loss=103.269, rew=-3.57]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 501.00it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=133.740, player_2/loss=159.699, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 499.52it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=80.712, player_2/loss=144.845, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 490.14it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=146.719, player_2/loss=130.993, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 490.92it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=141.605, player_2/loss=103.411, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.10it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=97.190, player_2/loss=61.374, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 496.10it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=107.843, player_2/loss=77.658, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 498.82it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=176.852, player_2/loss=76.720, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 497.10it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=176.673, player_2/loss=113.521, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 492.88it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=83.585, player_2/loss=139.952, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.40it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=91.171, player_2/loss=167.007, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 501.65it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=51.738, player_2/loss=101.799, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 496.59it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=38.180, player_2/loss=41.962, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.10it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=30.481, player_2/loss=119.371, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 487.89it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=93.689, player_2/loss=192.363, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.96it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=96.351, player_2/loss=216.802, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 494.01it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=99.040, player_2/loss=243.692, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 495.49it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=106.619, player_2/loss=214.388, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.32it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=61.299, player_2/loss=203.533, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 491.19it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=57.554, player_2/loss=184.843, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.96it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=80.942, player_2/loss=174.952, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.20it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=99.336, rew=16.67]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 493.55it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=75.284, player_2/loss=250.013, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 494.00it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=66.793, player_2/loss=235.241, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.71it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=69.265, player_2/loss=164.411, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.36it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=50.107, player_2/loss=186.760, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 495.42it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=26.997, player_2/loss=242.795, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 497.10it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=19.845, player_2/loss=241.919, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 492.84it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=74.626, rew=25.00]        


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 495.90it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=85.150, player_2/loss=212.718, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.83it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=44.723, player_2/loss=204.070, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 492.54it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=43.881, player_2/loss=114.469, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 496.63it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=32.660, player_2/loss=124.359, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.94it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=114.428, player_2/loss=157.149, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 497.08it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=179.858, player_2/loss=156.233, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 488.56it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=238.885, player_2/loss=155.587, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 489.58it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=243.821, player_2/loss=129.196, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.78it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=340.628, player_2/loss=124.067, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 496.52it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=347.142, player_2/loss=113.702, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 496.46it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=214.383, player_2/loss=92.959, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.04it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=206.510, player_2/loss=99.439, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 485.73it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=246.468, player_2/loss=65.441, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 494.21it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=215.865, player_2/loss=42.311, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 494.44it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=159.368, player_2/loss=47.071, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 494.33it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=176.156, player_2/loss=34.959, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 497.45it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=287.533, player_2/loss=31.525, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 492.30it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=219.415, player_2/loss=30.296, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 495.19it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=125.939, player_2/loss=38.934, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.59it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=246.565, player_2/loss=23.528, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 495.61it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=224.537, player_2/loss=16.353, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 493.24it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=259.323, player_2/loss=76.492, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.64it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=174.749, player_2/loss=144.189, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 484.17it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=102.994, player_2/loss=197.018, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 496.86it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=89.127, player_2/loss=259.857, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 504.92it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=66.265, player_2/loss=285.079, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 495.77it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=25.060, player_2/loss=277.036, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 496.40it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=41.437, player_2/loss=321.380, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 487.25it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=28.538, player_2/loss=272.504, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 494.13it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=34.200, player_2/loss=319.463, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 492.41it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=65.637, player_2/loss=263.692, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 495.13it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=103.060, player_2/loss=263.104, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 491.32it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_1/loss=61.742, player_2/loss=252.934, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 489.24it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=44.158, player_2/loss=245.181, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 497.06it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=22.771, player_2/loss=277.080, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 495.15it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=11.185, player_2/loss=271.672, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 495.93it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=20.767, player_2/loss=238.679, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 495.95it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=43.861, player_2/loss=240.173, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 489.73it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=57.672, player_2/loss=281.048, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 493.70it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=54.342, player_2/loss=281.850, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 494.45it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=98.881, player_2/loss=247.347, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 496.15it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=108.738, player_2/loss=243.356, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 495.57it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=113.578, player_2/loss=186.621, rew=12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 485.55it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=123.096, player_2/loss=85.895, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.34it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=124.997, player_2/loss=37.397, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 496.99it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=108.366, player_2/loss=33.519, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.86it/s, env_step=7168, len=16, n/ep=5, n/st=64, player_1/loss=113.963, player_2/loss=30.274, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 492.87it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=147.551, player_2/loss=38.305, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 486.41it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=160.479, player_2/loss=44.362, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 495.72it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=170.880, player_2/loss=49.932, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 499.91it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=143.754, player_2/loss=31.738, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 494.28it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=156.696, player_2/loss=14.362, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.83it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=179.647, player_2/loss=11.381, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 489.43it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=162.442, player_2/loss=20.531, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 495.62it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=117.672, player_2/loss=21.556, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 499.10it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=119.218, player_2/loss=39.016, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.04it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=151.309, player_2/loss=125.944, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 498.09it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=146.550, player_2/loss=116.886, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 490.20it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=131.855, player_2/loss=47.986, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 495.20it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=127.940, player_2/loss=63.995, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.42it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=83.006, player_2/loss=91.055, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 496.58it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=95.232, player_2/loss=164.370, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 493.48it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=133.442, player_2/loss=216.022, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 466.35it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=106.267, player_2/loss=220.569, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 491.99it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=88.073, player_2/loss=131.271, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 498.51it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=118.306, player_2/loss=320.308, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.36it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=100.219, player_2/loss=251.942, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.92it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=46.094, player_2/loss=260.807, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.58it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=103.698, player_2/loss=248.506, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.18it/s, env_step=11264, len=8, n/ep=9, n/st=64, player_1/loss=183.133, player_2/loss=264.521, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.52it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=144.397, player_2/loss=340.543, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.05it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=64.919, rew=25.00]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 493.58it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=41.336, player_2/loss=369.094, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 491.26it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=28.741, player_2/loss=336.314, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 484.90it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=97.785, player_2/loss=325.689, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 493.53it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=107.039, player_2/loss=397.284, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 499.42it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=100.148, player_2/loss=410.678, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 492.37it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=101.361, player_2/loss=323.826, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 488.11it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=152.110, player_2/loss=217.424, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 484.02it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=174.774, player_2/loss=170.316, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 496.11it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=280.053, player_2/loss=148.453, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 499.69it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=257.954, player_2/loss=139.270, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 502.42it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=184.337, player_2/loss=100.411, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 497.89it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=143.259, player_2/loss=93.412, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 492.80it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=229.882, player_2/loss=72.443, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 493.53it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=269.071, player_2/loss=72.514, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.49it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=288.217, player_2/loss=114.716, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 493.82it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=350.262, player_2/loss=100.651, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 498.76it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=391.735, player_2/loss=67.655, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.53it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=358.812, player_2/loss=76.698, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.36it/s, env_step=13312, len=10, n/ep=5, n/st=64, player_1/loss=252.129, player_2/loss=62.507, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 492.16it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=326.903, player_2/loss=24.565, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 496.87it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=318.816, player_2/loss=58.577, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.67it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=258.406, player_2/loss=61.785, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.73it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=245.633, player_2/loss=61.807, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 485.61it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=225.404, player_2/loss=52.825, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 498.09it/s, env_step=19456, len=23, n/ep=3, n/st=64, player_1/loss=178.015, player_2/loss=91.760, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 497.80it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=197.414, player_2/loss=32.068, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.20it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=176.730, player_2/loss=47.369, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 470.83it/s, env_step=3072, len=30, n/ep=2, n/st=64, player_2/loss=68.825, rew=62.50]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 491.83it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=97.956, player_2/loss=137.952, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 497.74it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=94.214, player_2/loss=135.065, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 497.19it/s, env_step=6144, len=21, n/ep=2, n/st=64, player_1/loss=125.291, player_2/loss=101.611, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 496.77it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=95.374, player_2/loss=96.196, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 497.97it/s, env_step=8192, len=24, n/ep=3, n/st=64, player_1/loss=56.229, player_2/loss=108.240, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 493.50it/s, env_step=9216, len=29, n/ep=2, n/st=64, player_1/loss=57.291, player_2/loss=91.484, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 496.14it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=91.312, player_2/loss=59.062, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 495.39it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=121.797, player_2/loss=83.662, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 497.40it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=64.039, player_2/loss=64.841, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 498.31it/s, env_step=13312, len=24, n/ep=3, n/st=64, player_1/loss=82.327, player_2/loss=98.638, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 484.79it/s, env_step=14336, len=24, n/ep=3, n/st=64, player_1/loss=99.874, player_2/loss=135.998, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 496.08it/s, env_step=15360, len=29, n/ep=3, n/st=64, player_1/loss=64.761, player_2/loss=105.198, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 496.58it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=74.422, player_2/loss=113.832, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 496.59it/s, env_step=17408, len=27, n/ep=3, n/st=64, player_1/loss=84.718, player_2/loss=118.880, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 496.83it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=57.215, player_2/loss=62.415, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 490.95it/s, env_step=19456, len=23, n/ep=3, n/st=64, player_1/loss=68.511, player_2/loss=51.222, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 496.14it/s, env_step=1024, len=24, n/ep=2, n/st=64, player_1/loss=135.550, player_2/loss=164.566, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.26it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=102.028, player_2/loss=128.110, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 493.55it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=83.800, player_2/loss=109.950, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 496.58it/s, env_step=4096, len=27, n/ep=3, n/st=64, player_1/loss=68.034, player_2/loss=146.315, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 490.42it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=87.136, player_2/loss=145.139, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 494.44it/s, env_step=6144, len=26, n/ep=3, n/st=64, player_1/loss=97.744, player_2/loss=80.062, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 492.76it/s, env_step=7168, len=35, n/ep=2, n/st=64, player_1/loss=75.820, player_2/loss=65.721, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 494.42it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=83.413, player_2/loss=92.007, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 494.71it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=125.534, player_2/loss=140.488, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 490.88it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=115.988, player_2/loss=125.402, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 496.59it/s, env_step=11264, len=36, n/ep=1, n/st=64, player_1/loss=60.793, player_2/loss=108.866, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 498.61it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=44.349, player_2/loss=109.264, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 495.67it/s, env_step=13312, len=24, n/ep=2, n/st=64, player_1/loss=81.729, player_2/loss=117.001, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 497.50it/s, env_step=14336, len=24, n/ep=3, n/st=64, player_1/loss=67.680, player_2/loss=116.806, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 495.28it/s, env_step=15360, len=25, n/ep=3, n/st=64, player_1/loss=71.086, player_2/loss=81.119, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 492.76it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=60.851, player_2/loss=72.899, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 496.64it/s, env_step=17408, len=8, n/ep=9, n/st=64, player_1/loss=95.768, player_2/loss=165.232, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 493.77it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=192.920, player_2/loss=181.162, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 493.09it/s, env_step=19456, len=31, n/ep=3, n/st=64, player_1/loss=163.214, player_2/loss=108.138, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 494.12it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=113.341, player_2/loss=158.651, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.55it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=105.081, player_2/loss=165.665, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.03it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=144.386, player_2/loss=152.405, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.29it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=114.771, player_2/loss=124.803, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 503.00it/s, env_step=5120, len=27, n/ep=3, n/st=64, player_1/loss=88.510, player_2/loss=109.731, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 497.24it/s, env_step=6144, len=16, n/ep=5, n/st=64, player_1/loss=122.685, player_2/loss=130.120, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 490.92it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=112.239, player_2/loss=87.846, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 495.66it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=95.222, player_2/loss=90.798, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 497.63it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=81.162, player_2/loss=78.027, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 495.26it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=121.130, player_2/loss=117.743, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 496.10it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=147.820, player_2/loss=167.310, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 485.94it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=86.861, player_2/loss=163.987, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 493.66it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=39.834, player_2/loss=118.859, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 493.06it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=50.374, player_2/loss=135.768, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 492.44it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=73.233, player_2/loss=144.100, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 494.73it/s, env_step=16384, len=9, n/ep=5, n/st=64, player_1/loss=66.294, player_2/loss=186.256, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 494.91it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=69.849, player_2/loss=209.554, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 490.88it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=63.564, player_2/loss=215.980, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 493.06it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=36.877, player_2/loss=218.298, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 494.66it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=70.360, player_2/loss=255.196, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.48it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=68.417, player_2/loss=224.866, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.46it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=168.391, player_2/loss=288.393, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 484.54it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=309.448, player_2/loss=316.559, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 496.60it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=502.995, player_2/loss=217.242, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 497.80it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=519.420, player_2/loss=183.494, rew=6.25]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 496.64it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=504.039, player_2/loss=126.322, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 499.72it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=458.284, player_2/loss=79.255, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 486.91it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=472.026, player_2/loss=56.004, rew=3.57]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 493.78it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=498.707, player_2/loss=80.462, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 497.78it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=522.139, player_2/loss=90.138, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 495.37it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=534.966, player_2/loss=65.410, rew=18.75]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 496.13it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=581.179, player_2/loss=84.474, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 491.62it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=561.026, player_2/loss=80.452, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 492.15it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=433.419, player_2/loss=54.488, rew=6.25]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 496.37it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=424.272, player_2/loss=31.045, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 497.57it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=473.518, player_2/loss=26.694, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 494.46it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=506.286, player_2/loss=62.646, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 497.74it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=560.549, player_2/loss=56.983, rew=18.75]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 486.86it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=322.559, player_2/loss=75.690, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.55it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=331.053, player_2/loss=73.710, rew=6.25]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 493.52it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=291.708, player_2/loss=353.782, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 492.14it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=188.580, player_2/loss=510.258, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 493.53it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=121.929, player_2/loss=495.063, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 488.08it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=142.295, player_2/loss=467.669, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.16it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=177.151, player_2/loss=539.124, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 494.35it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=158.500, player_2/loss=567.829, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 494.95it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=88.652, player_2/loss=578.794, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 494.92it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=85.669, player_2/loss=514.588, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 493.20it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=100.154, player_2/loss=605.185, rew=19.44]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 484.89it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=48.240, player_2/loss=662.338, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 490.34it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=67.623, player_2/loss=664.212, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.09it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=64.499, player_2/loss=511.155, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 490.70it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=49.966, player_2/loss=505.166, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 494.25it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=128.042, player_2/loss=568.663, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 485.41it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=98.663, player_2/loss=508.437, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 488.86it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=137.070, player_2/loss=454.997, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 493.17it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=75.766, player_2/loss=496.783, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 497.42it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=65.751, player_2/loss=408.850, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.11it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=81.960, player_2/loss=397.026, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 467.46it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=131.936, player_2/loss=324.699, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.71it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=78.318, player_2/loss=271.712, rew=-17.86]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 508.91it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=138.038, player_2/loss=192.633, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 499.04it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=237.187, player_2/loss=93.378, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 501.04it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=323.188, player_2/loss=32.836, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 491.83it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=383.180, player_2/loss=16.403, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 494.13it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=295.691, player_2/loss=38.629, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 498.75it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=175.146, player_2/loss=66.927, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 502.49it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=259.344, player_2/loss=45.130, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 495.91it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=313.133, player_2/loss=22.654, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 496.50it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=253.881, player_2/loss=13.698, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 492.82it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=293.168, player_2/loss=67.985, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 496.64it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=298.459, player_2/loss=111.812, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 494.70it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=340.510, player_2/loss=56.887, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 498.33it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=468.131, player_2/loss=15.224, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 498.06it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=319.525, player_2/loss=46.184, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 491.57it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=209.404, player_2/loss=57.089, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 492.81it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=197.899, player_2/loss=242.717, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.24it/s, env_step=2048, len=16, n/ep=3, n/st=64, player_1/loss=110.970, player_2/loss=248.060, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.30it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=37.385, player_2/loss=248.469, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 491.42it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=32.955, player_2/loss=274.235, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 497.74it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=52.146, player_2/loss=300.678, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 488.28it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=61.349, player_2/loss=356.841, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 497.77it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=23.936, player_2/loss=329.983, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 494.53it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=16.433, player_2/loss=346.304, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 493.69it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=7.219, player_2/loss=330.792, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 494.48it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=8.401, player_2/loss=276.251, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 486.46it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=63.058, player_2/loss=291.692, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 497.10it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=65.444, player_2/loss=231.796, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 497.11it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=13.609, player_2/loss=270.496, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 494.87it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=12.344, player_2/loss=277.128, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.05it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=8.416, player_2/loss=307.800, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 488.93it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=9.090, player_2/loss=269.302, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 494.11it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=13.335, player_2/loss=268.607, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 495.54it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=10.746, player_2/loss=341.327, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 493.77it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=56.489, player_2/loss=321.686, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 498.27it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=61.961, player_2/loss=314.627, rew=-6.25]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.80it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=182.916, player_2/loss=270.758, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 499.85it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=245.134, player_2/loss=176.548, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.02it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=207.449, player_2/loss=88.270, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 494.17it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=188.131, player_2/loss=172.427, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 498.05it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=194.830, player_2/loss=160.815, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 497.61it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=315.691, player_2/loss=53.767, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.88it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=333.871, player_2/loss=64.206, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 500.15it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=270.889, player_2/loss=31.991, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 496.86it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=285.663, player_2/loss=30.334, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 493.08it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=315.883, player_2/loss=68.308, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 488.99it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=224.781, player_2/loss=82.547, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 489.73it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=216.112, player_2/loss=59.837, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 494.56it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=315.780, player_2/loss=56.175, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 500.28it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=325.950, player_2/loss=42.267, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 498.83it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=220.654, player_2/loss=52.722, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 497.90it/s, env_step=17408, len=15, n/ep=3, n/st=64, player_1/loss=136.848, player_2/loss=85.635, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 501.55it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=91.471, player_2/loss=99.028, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.38it/s, env_step=19456, len=27, n/ep=2, n/st=64, player_1/loss=121.766, player_2/loss=98.359, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 495.21it/s, env_step=1024, len=29, n/ep=2, n/st=64, player_1/loss=19.916, player_2/loss=106.665, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.71it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=58.416, player_2/loss=111.755, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 497.49it/s, env_step=3072, len=15, n/ep=3, n/st=64, player_1/loss=73.702, player_2/loss=111.277, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.30it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=99.408, rew=12.50]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.46it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=104.912, player_2/loss=114.239, rew=25.00]


Epoch #5: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 498.74it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=59.048, player_2/loss=128.684, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 499.30it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=34.149, player_2/loss=135.964, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 498.28it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=28.962, player_2/loss=161.567, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 495.68it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=37.558, player_2/loss=188.432, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 485.31it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=60.217, player_2/loss=159.223, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 497.69it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=46.425, player_2/loss=159.056, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 490.24it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=51.503, player_2/loss=170.777, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 501.46it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=9.555, player_2/loss=170.109, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 495.88it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=59.495, player_2/loss=174.798, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 489.24it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=123.780, player_2/loss=123.681, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 502.46it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=80.820, player_2/loss=94.026, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 497.11it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=55.414, player_2/loss=135.871, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 500.74it/s, env_step=18432, len=21, n/ep=4, n/st=64, player_1/loss=86.173, player_2/loss=168.372, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 499.47it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=129.997, player_2/loss=152.560, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 489.65it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=61.147, player_2/loss=190.700, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 497.84it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=65.572, player_2/loss=129.903, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 502.19it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=97.598, player_2/loss=102.533, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 500.30it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=126.206, player_2/loss=130.296, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 499.12it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=136.403, player_2/loss=130.991, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 500.90it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=140.492, player_2/loss=172.118, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 488.94it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=116.894, player_2/loss=161.792, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 499.26it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=86.407, player_2/loss=102.954, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 499.78it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_2/loss=87.623, rew=-5.00]          


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 494.92it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=78.636, player_2/loss=97.302, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 500.27it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=101.043, player_2/loss=90.753, rew=-15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 491.60it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=99.844, player_2/loss=102.277, rew=-15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 500.24it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=99.803, player_2/loss=105.243, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 498.05it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=112.857, player_2/loss=126.259, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 498.74it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=108.597, player_2/loss=166.566, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 493.53it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=98.289, player_2/loss=193.486, rew=-15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.29it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=89.346, player_2/loss=132.762, rew=-15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 495.50it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=106.821, player_2/loss=74.235, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 501.68it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=116.442, player_2/loss=61.154, rew=-15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 496.29it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=153.523, player_2/loss=136.719, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 497.58it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=145.087, player_2/loss=158.856, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 495.68it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=104.293, player_2/loss=219.729, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 499.19it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=85.412, player_2/loss=204.820, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 497.47it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=69.185, player_2/loss=188.938, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.59it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=83.697, player_2/loss=197.248, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 493.10it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=166.243, player_2/loss=218.718, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.49it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=201.112, player_2/loss=212.217, rew=19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 482.76it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=154.237, player_2/loss=218.444, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 492.10it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_2/loss=202.698, rew=19.44]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 495.63it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=91.354, player_2/loss=182.620, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.35it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=61.185, player_2/loss=162.418, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 494.70it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=70.971, player_2/loss=174.102, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 489.69it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=82.669, player_2/loss=171.671, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 494.51it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=69.387, player_2/loss=244.202, rew=2.78]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 494.26it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=67.522, player_2/loss=235.575, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 496.33it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=60.569, player_2/loss=176.355, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 496.04it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=59.163, player_2/loss=165.012, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 492.09it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=56.127, player_2/loss=156.901, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 488.35it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=34.216, player_2/loss=155.781, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 496.31it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=159.578, player_2/loss=195.282, rew=-10.71]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 495.88it/s, env_step=3072, len=13, n/ep=6, n/st=64, player_1/loss=287.699, player_2/loss=174.818, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.41it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=252.952, player_2/loss=102.333, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 502.54it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=211.450, player_2/loss=99.829, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 483.75it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=174.476, player_2/loss=99.966, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.23it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=221.944, player_2/loss=53.709, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 498.14it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=238.567, player_2/loss=37.637, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 499.19it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=227.629, player_2/loss=26.096, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 498.16it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=216.970, player_2/loss=29.174, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.84it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=197.263, player_2/loss=57.086, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.05it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=351.903, player_2/loss=69.678, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 498.48it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=355.969, player_2/loss=93.148, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 495.53it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=238.422, player_2/loss=49.302, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 495.53it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=203.386, player_2/loss=22.930, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 492.11it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=178.091, player_2/loss=38.705, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.91it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=210.031, player_2/loss=45.974, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 496.60it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=223.910, player_2/loss=35.047, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 496.92it/s, env_step=19456, len=15, n/ep=5, n/st=64, player_1/loss=221.485, player_2/loss=32.192, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 492.79it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=217.930, player_2/loss=65.798, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 494.67it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=173.872, player_2/loss=104.669, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 490.73it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=119.430, player_2/loss=144.372, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 497.69it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=95.054, player_2/loss=243.514, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.01it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=154.839, player_2/loss=331.510, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 493.53it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=131.172, player_2/loss=396.183, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 495.20it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=29.660, player_2/loss=383.295, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 488.34it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=28.687, player_2/loss=357.758, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.23it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=40.570, player_2/loss=348.143, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 492.91it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=29.811, player_2/loss=367.338, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 492.35it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=16.214, player_2/loss=391.398, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 493.49it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=24.594, player_2/loss=358.591, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 484.88it/s, env_step=13312, len=11, n/ep=7, n/st=64, player_1/loss=22.714, player_2/loss=343.500, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 496.15it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=24.326, player_2/loss=324.041, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 494.95it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=11.199, player_2/loss=401.902, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.16it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=8.498, player_2/loss=421.506, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 492.70it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=4.538, player_2/loss=411.474, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 489.81it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=4.429, player_2/loss=406.118, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 490.07it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=12.996, player_2/loss=406.565, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 491.33it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=330.787, player_2/loss=230.323, rew=3.57]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 493.25it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=623.735, player_2/loss=191.454, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 493.74it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=719.268, player_2/loss=160.386, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 488.31it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=707.785, player_2/loss=109.645, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 497.74it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=768.687, player_2/loss=85.825, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 491.31it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=686.214, player_2/loss=116.976, rew=3.57]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.47it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=675.955, player_2/loss=69.517, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 497.12it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=711.633, player_2/loss=36.721, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 493.56it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=593.185, player_2/loss=29.324, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 486.52it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=618.601, player_2/loss=10.687, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 493.66it/s, env_step=11264, len=8, n/ep=6, n/st=64, player_1/loss=842.799, player_2/loss=8.613, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.51it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=746.594, player_2/loss=101.161, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 496.01it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=635.918, player_2/loss=127.794, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 494.31it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=671.868, player_2/loss=110.134, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 490.27it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=614.566, player_2/loss=129.132, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 493.74it/s, env_step=16384, len=10, n/ep=8, n/st=64, player_1/loss=606.724, player_2/loss=170.143, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 493.78it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=648.886, player_2/loss=61.017, rew=17.86]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.30it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=620.121, player_2/loss=51.041, rew=18.75]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 484.90it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=680.804, player_2/loss=62.742, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 492.08it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=490.917, player_2/loss=43.869, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.37it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=410.178, player_2/loss=26.076, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.93it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=272.863, player_2/loss=23.310, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 498.02it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=157.487, player_2/loss=148.609, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 502.72it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=68.921, player_2/loss=289.336, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 483.52it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=34.443, player_2/loss=485.622, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 494.96it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=21.594, player_2/loss=469.610, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.78it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=17.625, player_2/loss=333.151, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 493.28it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=16.738, player_2/loss=381.447, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.34it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=11.349, player_2/loss=421.150, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 491.98it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=21.367, player_2/loss=339.999, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 496.82it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=16.359, player_2/loss=321.401, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 497.49it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=13.055, player_2/loss=381.141, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 499.73it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=16.820, player_2/loss=516.459, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 487.84it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=12.229, player_2/loss=544.300, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 495.41it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=9.885, player_2/loss=495.061, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 494.40it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=9.634, player_2/loss=467.646, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 498.12it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=6.830, player_2/loss=528.869, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 496.39it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=8.024, player_2/loss=554.476, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 494.80it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=3.839, player_2/loss=350.868, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.07it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=7.327, player_2/loss=324.829, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 486.22it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=8.492, player_2/loss=248.501, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.22it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=19.887, player_2/loss=190.482, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.64it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=18.398, player_2/loss=176.561, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 496.11it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=6.237, player_2/loss=169.919, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.01it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=23.060, player_2/loss=131.302, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 497.38it/s, env_step=8192, len=28, n/ep=3, n/st=64, player_1/loss=49.217, player_2/loss=61.171, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 501.70it/s, env_step=9216, len=25, n/ep=2, n/st=64, player_1/loss=45.528, player_2/loss=28.033, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 499.29it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_2/loss=20.973, rew=-25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 498.99it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=108.267, player_2/loss=26.490, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 498.57it/s, env_step=12288, len=26, n/ep=3, n/st=64, player_1/loss=73.324, player_2/loss=67.754, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 487.78it/s, env_step=13312, len=26, n/ep=3, n/st=64, player_1/loss=73.357, player_2/loss=69.322, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 502.46it/s, env_step=14336, len=23, n/ep=3, n/st=64, player_1/loss=40.033, player_2/loss=22.265, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 501.44it/s, env_step=15360, len=25, n/ep=2, n/st=64, player_1/loss=43.475, player_2/loss=25.194, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 501.48it/s, env_step=16384, len=23, n/ep=2, n/st=64, player_1/loss=132.668, player_2/loss=111.378, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 497.65it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=137.047, player_2/loss=130.487, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 487.42it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=159.782, player_2/loss=96.769, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 499.54it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=193.986, player_2/loss=134.356, rew=-8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 496.61it/s, env_step=1024, len=11, n/ep=7, n/st=64, player_1/loss=279.513, player_2/loss=293.214, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 493.71it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=177.060, player_2/loss=250.182, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 495.14it/s, env_step=3072, len=9, n/ep=8, n/st=64, player_1/loss=119.293, player_2/loss=210.134, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 487.36it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=210.680, player_2/loss=233.658, rew=-16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 492.44it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=169.855, player_2/loss=228.406, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.88it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=113.491, player_2/loss=212.576, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 494.28it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=120.446, player_2/loss=185.200, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 492.52it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=167.306, player_2/loss=212.903, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 494.66it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=163.143, player_2/loss=184.961, rew=17.86]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 489.78it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=132.823, player_2/loss=148.863, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 496.10it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=118.771, player_2/loss=161.032, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 499.27it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=107.714, player_2/loss=187.615, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 494.06it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=65.610, player_2/loss=189.763, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 493.21it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=85.361, player_2/loss=194.569, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 484.76it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=54.267, player_2/loss=215.056, rew=3.57]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 480.72it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=88.614, player_2/loss=217.619, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 495.67it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=143.300, player_2/loss=170.899, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 494.94it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=88.352, player_2/loss=162.548, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 421.75it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=55.525, player_2/loss=175.348, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 375.28it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=108.223, player_2/loss=99.202, rew=15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 467.52it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=102.120, player_2/loss=104.098, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 501.90it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=110.993, player_2/loss=129.141, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 500.46it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=133.386, player_2/loss=161.411, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 497.01it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=250.178, player_2/loss=209.687, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 445.57it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=250.286, player_2/loss=185.392, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 521.94it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=159.746, player_2/loss=158.838, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:01, 524.70it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=113.323, player_2/loss=149.990, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:01, 519.96it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=68.490, player_2/loss=69.834, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 517.26it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=70.191, player_2/loss=59.462, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 455.37it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=82.601, player_2/loss=75.146, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 512.23it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=120.434, player_2/loss=133.027, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 446.85it/s, env_step=13312, len=16, n/ep=3, n/st=64, player_1/loss=167.358, player_2/loss=161.951, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 508.26it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=133.470, player_2/loss=129.836, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 511.05it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=112.057, rew=0.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 451.99it/s, env_step=16384, len=26, n/ep=2, n/st=64, player_1/loss=125.985, player_2/loss=106.446, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 519.31it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=146.973, player_2/loss=61.635, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:01, 522.13it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=129.093, player_2/loss=67.986, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 509.26it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=233.327, player_2/loss=53.548, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 485.22it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=245.140, player_2/loss=9.852, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 426.88it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=187.694, player_2/loss=24.672, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 504.05it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=79.814, player_2/loss=85.688, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 486.79it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=69.517, player_2/loss=111.674, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 437.31it/s, env_step=5120, len=23, n/ep=2, n/st=64, player_1/loss=93.414, player_2/loss=148.092, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 501.98it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=79.933, player_2/loss=141.442, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:01, 516.28it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=70.033, player_2/loss=130.719, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 491.92it/s, env_step=8192, len=12, n/ep=4, n/st=64, player_1/loss=152.080, player_2/loss=131.699, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 489.30it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=115.834, player_2/loss=122.749, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 433.30it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=117.409, player_2/loss=158.130, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 497.99it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=176.040, player_2/loss=178.947, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:01, 516.44it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=157.541, player_2/loss=210.433, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:01, 514.53it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=132.357, player_2/loss=182.613, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 501.15it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=111.390, player_2/loss=183.530, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 508.72it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=42.765, player_2/loss=192.142, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 424.99it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=47.404, player_2/loss=168.206, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 479.02it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=42.877, player_2/loss=193.665, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 511.53it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=22.336, player_2/loss=176.484, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 477.93it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=23.395, player_2/loss=156.506, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 424.08it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=15.139, player_2/loss=130.301, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 506.13it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=30.261, player_2/loss=117.896, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 422.30it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=26.541, player_2/loss=73.029, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 488.18it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=51.826, player_2/loss=50.805, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 430.95it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=31.970, player_2/loss=51.661, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 475.70it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=14.070, player_2/loss=40.731, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 430.19it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=83.204, player_2/loss=86.958, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 471.15it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=200.716, player_2/loss=196.771, rew=-10.71]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 444.30it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=225.727, player_2/loss=176.715, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 471.47it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=217.437, player_2/loss=53.952, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 508.95it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=221.728, player_2/loss=41.345, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 432.47it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=176.773, player_2/loss=150.749, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 480.24it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=139.300, player_2/loss=187.543, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:01, 520.83it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=118.056, player_2/loss=147.423, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 512.24it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=103.227, player_2/loss=146.665, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 501.01it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=82.734, player_2/loss=117.673, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 443.75it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=143.274, player_2/loss=130.234, rew=-12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 468.96it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=221.755, player_2/loss=136.313, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 506.89it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=249.565, player_2/loss=83.978, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 511.93it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=197.403, player_2/loss=287.981, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 425.98it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=157.318, player_2/loss=176.361, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.90it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=91.103, player_2/loss=119.306, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 484.62it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_2/loss=127.519, rew=12.50]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.89it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=118.390, player_2/loss=131.214, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 488.27it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=69.886, player_2/loss=177.776, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 491.17it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=86.935, player_2/loss=199.216, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 438.27it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=179.203, player_2/loss=228.824, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 459.11it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=164.034, player_2/loss=229.660, rew=19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 462.81it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=91.872, player_2/loss=199.420, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 451.73it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=65.253, player_2/loss=208.691, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 409.97it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=92.469, player_2/loss=237.347, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 421.47it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=64.139, player_2/loss=243.246, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 471.19it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=62.757, player_2/loss=225.166, rew=3.57]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 464.68it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=65.284, player_2/loss=252.123, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 475.41it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=50.136, player_2/loss=244.548, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 474.50it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=57.842, rew=19.44]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 478.16it/s, env_step=18432, len=8, n/ep=9, n/st=64, player_1/loss=38.962, player_2/loss=245.106, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 478.04it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=44.875, player_2/loss=240.612, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 390.90it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=37.015, player_2/loss=274.835, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 324.17it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=48.487, player_2/loss=192.805, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.78it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=38.348, player_2/loss=117.935, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.62it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=57.542, player_2/loss=164.539, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 361.75it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=76.271, rew=-8.33]          


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 330.16it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=103.302, player_2/loss=160.240, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 338.57it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=132.457, player_2/loss=178.448, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.53it/s, env_step=8192, len=29, n/ep=2, n/st=64, player_1/loss=112.838, player_2/loss=147.278, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 345.37it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=96.591, player_2/loss=95.945, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 332.21it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=102.941, player_2/loss=71.072, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 335.58it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=83.592, player_2/loss=71.370, rew=-15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:02, 342.53it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=140.714, player_2/loss=119.300, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:03, 336.90it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=168.184, player_2/loss=155.916, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:02, 342.04it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_2/loss=140.117, rew=-25.00]      


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:03, 304.27it/s, env_step=15360, len=23, n/ep=2, n/st=64, player_1/loss=161.123, player_2/loss=95.318, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:03, 314.30it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=137.349, player_2/loss=123.685, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:02, 355.82it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=146.945, player_2/loss=141.261, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:03, 334.10it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=168.864, player_2/loss=114.378, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:02, 345.30it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=179.235, player_2/loss=88.250, rew=-12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:03, 324.21it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=144.007, player_2/loss=145.060, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.45it/s, env_step=2048, len=21, n/ep=2, n/st=64, player_1/loss=98.554, player_2/loss=105.989, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.55it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=69.392, player_2/loss=112.687, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 343.16it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=85.391, player_2/loss=145.252, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 355.06it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=78.401, player_2/loss=209.342, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 353.27it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=62.851, player_2/loss=231.729, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 343.06it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=39.450, player_2/loss=218.896, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.94it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=8.894, player_2/loss=188.232, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 322.60it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=12.867, player_2/loss=192.741, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.56it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=36.022, player_2/loss=193.529, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 312.30it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=59.525, player_2/loss=190.115, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:04, 234.17it/s, env_step=12288, len=7, n/ep=10, n/st=64, player_1/loss=37.653, player_2/loss=180.083, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:04, 217.03it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=45.900, player_2/loss=155.272, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 284.72it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=54.166, player_2/loss=176.396, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 276.95it/s, env_step=15360, len=10, n/ep=5, n/st=64, player_1/loss=26.753, player_2/loss=191.373, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 268.46it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=23.165, player_2/loss=185.151, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 290.86it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=41.096, player_2/loss=153.411, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 292.42it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=18.676, player_2/loss=185.980, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 287.44it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=15.192, player_2/loss=232.264, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 275.75it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=145.822, player_2/loss=175.946, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 260.80it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=300.567, player_2/loss=166.581, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 279.77it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=349.858, player_2/loss=155.544, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 283.14it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=282.476, player_2/loss=98.799, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 287.60it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=321.487, player_2/loss=111.750, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 273.34it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=354.574, player_2/loss=91.401, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 260.20it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=271.358, player_2/loss=47.996, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 276.72it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=200.394, player_2/loss=43.781, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 316.45it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=183.305, player_2/loss=90.479, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 274.34it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=108.110, player_2/loss=150.596, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 282.79it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=61.710, player_2/loss=157.926, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 275.08it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=69.007, player_2/loss=139.663, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 259.03it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=91.074, player_2/loss=147.334, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:04, 250.24it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=93.144, player_2/loss=102.160, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:04, 247.42it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=110.905, rew=-25.00]      


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:04, 247.61it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=123.773, player_2/loss=40.798, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:04, 248.75it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=82.034, player_2/loss=31.782, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:04, 247.99it/s, env_step=18432, len=17, n/ep=3, n/st=64, player_1/loss=76.285, player_2/loss=30.582, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 267.14it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=71.386, player_2/loss=47.659, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 271.53it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=55.723, player_2/loss=31.302, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 326.93it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=80.024, player_2/loss=83.647, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 296.73it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=124.246, player_2/loss=146.965, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 300.17it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=90.630, player_2/loss=160.311, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 304.50it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=62.165, player_2/loss=128.827, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 296.49it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=31.544, player_2/loss=148.690, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 284.72it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=15.943, player_2/loss=160.338, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 290.17it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=14.205, player_2/loss=126.314, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 294.79it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=31.549, player_2/loss=129.578, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 291.47it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=51.797, player_2/loss=114.397, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 289.55it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=71.040, player_2/loss=136.638, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 287.02it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=69.925, player_2/loss=134.584, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 301.20it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=69.756, player_2/loss=82.248, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 281.67it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=16.467, player_2/loss=57.430, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 285.20it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=28.245, player_2/loss=72.684, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:04, 256.18it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=66.786, player_2/loss=85.391, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:04, 240.00it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_2/loss=108.021, rew=25.00]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 277.61it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=31.022, player_2/loss=76.706, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 313.38it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=10.892, player_2/loss=52.700, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 334.20it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=35.048, player_2/loss=18.446, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 298.29it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=36.634, player_2/loss=13.533, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 308.82it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=83.395, player_2/loss=16.689, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 311.47it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=114.274, player_2/loss=72.263, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 305.52it/s, env_step=5120, len=21, n/ep=2, n/st=64, player_1/loss=75.917, player_2/loss=69.300, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 312.22it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=52.339, player_2/loss=11.077, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 315.94it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=85.552, player_2/loss=84.710, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:03, 315.65it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=199.002, player_2/loss=176.064, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:03, 296.07it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=336.248, player_2/loss=167.408, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:03, 299.65it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=369.647, player_2/loss=86.894, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:03, 299.18it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=373.424, player_2/loss=86.639, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:03, 314.02it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=499.293, player_2/loss=80.728, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:03, 308.71it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=499.307, player_2/loss=58.486, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:03, 311.25it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=414.124, player_2/loss=66.560, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 365.09it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=306.548, player_2/loss=71.246, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:03, 281.57it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=247.399, player_2/loss=70.839, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:03, 304.63it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=210.502, player_2/loss=109.055, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:03, 323.95it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=215.790, player_2/loss=107.500, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:03, 302.60it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=228.024, player_2/loss=62.717, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:03, 287.36it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=171.837, player_2/loss=88.968, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 315.03it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_2/loss=198.351, rew=10.71]         


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 303.06it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=125.302, player_2/loss=351.888, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 301.82it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=93.909, player_2/loss=367.735, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 318.41it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=85.303, player_2/loss=306.272, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 329.32it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=48.987, player_2/loss=294.609, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 333.35it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=41.536, player_2/loss=343.861, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 317.76it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=107.360, player_2/loss=317.973, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 328.21it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=106.725, player_2/loss=308.968, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 317.05it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=26.824, player_2/loss=314.943, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 318.51it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=39.159, player_2/loss=335.202, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:04, 250.94it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=42.005, player_2/loss=336.692, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:04, 232.71it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=14.902, player_2/loss=356.207, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:04, 212.07it/s, env_step=14336, len=8, n/ep=9, n/st=64, player_1/loss=8.904, player_2/loss=321.098, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:04, 239.72it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=38.835, player_2/loss=308.385, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 309.75it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=93.592, player_2/loss=343.961, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 290.61it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=74.201, player_2/loss=321.539, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 309.94it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=53.757, player_2/loss=320.906, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 317.56it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=25.840, player_2/loss=287.583, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 309.39it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=40.878, player_2/loss=230.059, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 317.66it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=26.760, player_2/loss=197.960, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 322.78it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=71.594, player_2/loss=153.687, rew=-10.71]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 304.60it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=102.468, player_2/loss=143.582, rew=-10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 314.34it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=79.965, player_2/loss=126.950, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 335.18it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=56.466, player_2/loss=94.943, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 324.96it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=64.211, player_2/loss=83.197, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 297.98it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=98.198, player_2/loss=85.526, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 304.23it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=69.456, player_2/loss=72.867, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 307.38it/s, env_step=10240, len=7, n/ep=6, n/st=64, player_1/loss=20.147, player_2/loss=57.000, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 306.48it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=49.288, player_2/loss=56.435, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 304.60it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=96.148, player_2/loss=60.960, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 297.60it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=90.916, player_2/loss=58.624, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 303.47it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=83.830, player_2/loss=72.226, rew=-15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 287.73it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=85.422, player_2/loss=90.393, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 287.85it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=64.320, player_2/loss=87.132, rew=-15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 278.99it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=85.258, player_2/loss=120.091, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 272.24it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=98.767, player_2/loss=95.217, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 268.64it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=93.630, player_2/loss=82.551, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 263.91it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=63.040, player_2/loss=45.815, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 262.23it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=40.904, player_2/loss=45.640, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 291.24it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=29.331, player_2/loss=40.185, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 301.99it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=24.544, player_2/loss=34.425, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 289.24it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_2/loss=30.212, rew=25.00]          


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 282.12it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=26.196, player_2/loss=33.208, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 277.24it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=15.191, rew=16.67]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 271.27it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=23.264, player_2/loss=74.346, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 277.82it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=23.248, player_2/loss=77.179, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 271.47it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=11.840, player_2/loss=42.892, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 258.31it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=9.223, player_2/loss=40.421, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 267.61it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=9.567, player_2/loss=38.659, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 270.15it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=10.320, player_2/loss=24.592, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 275.05it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=9.537, rew=25.00]         


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 266.91it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=58.800, player_2/loss=47.424, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 288.64it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=55.222, player_2/loss=41.647, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 309.37it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=5.487, player_2/loss=25.280, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 312.18it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=44.766, player_2/loss=30.826, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 320.75it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=53.711, player_2/loss=36.267, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 327.73it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=20.374, player_2/loss=73.317, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 317.23it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=23.968, player_2/loss=43.698, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 282.91it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=37.632, player_2/loss=46.323, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 279.90it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=40.491, player_2/loss=53.407, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 261.34it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=50.752, player_2/loss=62.518, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 284.39it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=60.597, player_2/loss=75.667, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 293.10it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=42.140, player_2/loss=62.531, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 279.55it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=40.564, player_2/loss=48.875, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 292.82it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=36.979, player_2/loss=26.941, rew=-15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 302.53it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=27.639, player_2/loss=17.123, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 291.57it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=37.250, player_2/loss=17.889, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 300.78it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=43.212, player_2/loss=14.551, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 306.38it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=39.029, player_2/loss=10.200, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 301.64it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=33.238, player_2/loss=10.963, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 300.39it/s, env_step=15360, len=16, n/ep=3, n/st=64, player_1/loss=29.934, player_2/loss=8.135, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 272.39it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=31.641, player_2/loss=6.737, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 259.03it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=28.100, rew=-25.00]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 297.89it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=64.965, player_2/loss=15.265, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:04, 234.76it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=65.790, player_2/loss=27.273, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 288.25it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=50.355, player_2/loss=102.650, rew=19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 291.51it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=34.715, player_2/loss=85.779, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 281.89it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=24.906, player_2/loss=116.650, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 294.58it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=75.894, player_2/loss=207.188, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 300.11it/s, env_step=5120, len=7, n/ep=6, n/st=64, player_1/loss=83.390, player_2/loss=222.347, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 284.48it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=44.808, player_2/loss=222.660, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 299.57it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=54.520, player_2/loss=126.541, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 314.37it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=60.874, player_2/loss=172.755, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.13it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=53.572, player_2/loss=185.652, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 263.88it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=36.127, player_2/loss=161.140, rew=19.44]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:04, 238.29it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=83.170, player_2/loss=135.506, rew=6.25]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:04, 246.28it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=122.145, player_2/loss=139.493, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:06, 166.37it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=86.034, player_2/loss=131.664, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:10, 93.43it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=92.854, player_2/loss=117.550, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:04, 209.13it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=65.116, player_2/loss=115.609, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:04, 205.97it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=32.548, player_2/loss=156.883, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:04, 230.92it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=31.282, player_2/loss=201.874, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:04, 233.26it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=18.329, player_2/loss=172.604, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 361.97it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=31.418, player_2/loss=137.942, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 322.28it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=56.867, player_2/loss=106.826, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 313.22it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=75.528, player_2/loss=72.571, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 324.29it/s, env_step=3072, len=16, n/ep=3, n/st=64, player_1/loss=101.674, player_2/loss=145.344, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 324.82it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=116.062, player_2/loss=284.893, rew=-6.25]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 309.98it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=138.196, player_2/loss=298.584, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 325.64it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=162.355, player_2/loss=320.463, rew=-13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 321.91it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=146.807, player_2/loss=229.201, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 315.45it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=142.981, player_2/loss=193.206, rew=-19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 328.98it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=186.232, player_2/loss=169.348, rew=-13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 330.25it/s, env_step=10240, len=12, n/ep=4, n/st=64, player_1/loss=153.322, player_2/loss=148.125, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 328.18it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=153.913, player_2/loss=102.323, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 331.72it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=210.222, player_2/loss=81.899, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 328.75it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=199.935, player_2/loss=65.660, rew=16.67]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 332.33it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=195.874, player_2/loss=53.061, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 333.92it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=229.093, player_2/loss=96.071, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 332.94it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=194.411, player_2/loss=104.316, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.00it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=170.719, player_2/loss=52.818, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 335.41it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=213.524, player_2/loss=56.359, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 338.55it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=200.291, player_2/loss=56.811, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.02it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=173.393, player_2/loss=24.121, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 328.10it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=169.466, player_2/loss=105.198, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.40it/s, env_step=3072, len=8, n/ep=9, n/st=64, player_1/loss=166.486, player_2/loss=184.153, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 330.21it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=117.707, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 310.39it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=104.433, player_2/loss=289.194, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 461.66it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=83.358, player_2/loss=305.563, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 472.74it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=106.930, player_2/loss=279.189, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 524.15it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=79.376, player_2/loss=253.356, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:01, 517.03it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=32.201, player_2/loss=289.895, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 508.33it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=88.251, player_2/loss=321.560, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 501.24it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=93.845, player_2/loss=340.040, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:01, 513.87it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=42.305, player_2/loss=274.135, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 510.04it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=39.719, player_2/loss=242.872, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:01, 525.79it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=54.586, player_2/loss=232.156, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:01, 520.87it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=38.776, player_2/loss=260.338, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:01, 520.40it/s, env_step=16384, len=7, n/ep=7, n/st=64, player_1/loss=36.598, player_2/loss=244.619, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:01, 520.58it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=40.249, player_2/loss=303.763, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:01, 522.48it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=40.838, player_2/loss=321.115, rew=2.78]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:01, 520.82it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=87.917, player_2/loss=324.010, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:01, 515.65it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=19.725, player_2/loss=193.749, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 521.72it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=25.752, player_2/loss=214.068, rew=-13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:01, 519.76it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=51.516, player_2/loss=212.085, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:01, 519.40it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=37.285, player_2/loss=172.567, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:01, 518.33it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=39.011, player_2/loss=150.177, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:01, 519.73it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=25.852, player_2/loss=118.142, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:01, 522.15it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=40.723, player_2/loss=108.683, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:01, 516.96it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=28.955, player_2/loss=130.330, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:01, 520.51it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=39.121, player_2/loss=141.686, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 518.76it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=49.932, player_2/loss=157.530, rew=-19.44]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:01, 522.53it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=31.976, player_2/loss=93.910, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:01, 522.76it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=91.080, player_2/loss=120.285, rew=-2.78]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:01, 524.47it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=129.163, player_2/loss=190.491, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:01, 518.36it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=118.432, player_2/loss=187.979, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:01, 519.63it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=100.549, player_2/loss=138.151, rew=-19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:01, 521.56it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=79.520, player_2/loss=77.009, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 518.99it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=187.244, player_2/loss=153.328, rew=6.25]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:01, 519.64it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=421.238, player_2/loss=203.609, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:01, 521.93it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=440.623, player_2/loss=120.678, rew=10.71]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:01, 518.74it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=311.302, player_2/loss=136.179, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 526.79it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=276.983, player_2/loss=132.660, rew=-17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 526.90it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=239.128, player_2/loss=118.395, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 522.05it/s, env_step=4096, len=10, n/ep=7, n/st=64, player_1/loss=176.274, player_2/loss=132.173, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 520.68it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=134.373, player_2/loss=156.781, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 527.93it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=126.041, player_2/loss=188.213, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:01, 520.27it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=68.714, player_2/loss=277.119, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 521.63it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=48.402, player_2/loss=337.489, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:01, 525.33it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=57.501, player_2/loss=281.521, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:01, 524.62it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=36.338, player_2/loss=196.458, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 525.92it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=70.625, player_2/loss=189.336, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:01, 519.01it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=68.401, player_2/loss=190.889, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:01, 519.82it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=60.829, player_2/loss=254.947, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:01, 523.19it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=65.734, player_2/loss=272.969, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:01, 522.26it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=47.559, player_2/loss=233.862, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:01, 526.32it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=40.368, player_2/loss=246.026, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:01, 524.80it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=12.922, player_2/loss=236.098, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:01, 520.25it/s, env_step=18432, len=10, n/ep=5, n/st=64, player_1/loss=17.473, player_2/loss=242.079, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:01, 525.47it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=20.012, player_2/loss=258.575, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:01, 523.01it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=47.322, player_2/loss=171.154, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 525.77it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=79.424, player_2/loss=168.569, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 523.77it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=120.939, player_2/loss=137.734, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 530.87it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=96.998, player_2/loss=124.330, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 524.65it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=84.521, player_2/loss=132.189, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 527.34it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=96.247, player_2/loss=141.053, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:01, 518.22it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=86.590, player_2/loss=113.524, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:01, 527.96it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=60.099, player_2/loss=94.689, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:01, 527.41it/s, env_step=9216, len=22, n/ep=2, n/st=64, player_1/loss=51.042, player_2/loss=69.169, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:01, 526.81it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=72.672, player_2/loss=34.007, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:01, 524.69it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=161.782, player_2/loss=51.024, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:01, 527.27it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=187.106, player_2/loss=85.712, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:01, 525.56it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=135.079, player_2/loss=71.869, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:01, 522.56it/s, env_step=14336, len=24, n/ep=3, n/st=64, player_1/loss=123.791, player_2/loss=53.691, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:01, 526.65it/s, env_step=15360, len=38, n/ep=1, n/st=64, player_1/loss=107.886, player_2/loss=80.275, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:01, 526.80it/s, env_step=16384, len=24, n/ep=2, n/st=64, player_1/loss=62.660, player_2/loss=81.665, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:01, 522.39it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=46.697, player_2/loss=66.121, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:01, 525.32it/s, env_step=18432, len=25, n/ep=2, n/st=64, player_1/loss=46.978, player_2/loss=54.228, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:01, 526.34it/s, env_step=19456, len=27, n/ep=3, n/st=64, player_1/loss=75.154, player_2/loss=49.574, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:01, 513.80it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=149.671, player_2/loss=183.726, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 518.12it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=86.398, player_2/loss=162.624, rew=13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 521.72it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=35.861, player_2/loss=134.728, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 506.88it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=65.360, player_2/loss=141.853, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 517.14it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=98.421, player_2/loss=158.150, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:01, 517.72it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=102.079, player_2/loss=158.519, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:01, 521.04it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=116.972, player_2/loss=164.715, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:01, 514.94it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=99.852, player_2/loss=213.187, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:01, 519.41it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=109.304, player_2/loss=206.568, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 476.17it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=125.634, player_2/loss=185.287, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 419.62it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=62.340, player_2/loss=179.852, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 486.34it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=64.544, player_2/loss=158.595, rew=16.67]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 364.32it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=85.836, player_2/loss=180.390, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 412.73it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=49.333, player_2/loss=193.391, rew=18.75]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 332.01it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=18.337, player_2/loss=197.079, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 357.75it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=53.630, player_2/loss=146.067, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 489.27it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=89.446, player_2/loss=168.229, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 512.21it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=47.964, player_2/loss=185.384, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:01, 517.82it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=33.869, player_2/loss=171.882, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:01, 512.89it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=87.506, player_2/loss=134.673, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 515.00it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=130.234, player_2/loss=148.586, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 435.80it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=159.944, player_2/loss=138.766, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 427.93it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=160.173, player_2/loss=177.374, rew=-6.25]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 508.83it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=187.934, player_2/loss=187.325, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:01, 524.34it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=181.335, player_2/loss=163.837, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:01, 520.72it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=197.324, player_2/loss=161.279, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 509.20it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=220.782, player_2/loss=111.924, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 511.93it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=228.894, player_2/loss=74.074, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:01, 521.35it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=214.773, player_2/loss=41.121, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:01, 523.73it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=204.824, player_2/loss=42.572, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:01, 523.54it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=236.369, player_2/loss=44.287, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:01, 520.67it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=202.375, rew=25.00]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:01, 521.54it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=189.984, player_2/loss=74.056, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 470.86it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=198.248, player_2/loss=49.964, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 432.52it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=149.731, player_2/loss=60.683, rew=16.67]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 406.30it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=139.756, player_2/loss=26.588, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 500.27it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=181.220, player_2/loss=5.802, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 502.42it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=210.526, player_2/loss=17.415, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 503.05it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=187.806, player_2/loss=184.694, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 506.73it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=154.361, player_2/loss=340.572, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 510.14it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=163.484, player_2/loss=440.108, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.08it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=179.862, player_2/loss=422.135, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 454.66it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=155.771, player_2/loss=473.976, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 430.07it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=143.781, player_2/loss=468.136, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 397.37it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=127.229, player_2/loss=478.322, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 404.74it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=99.121, player_2/loss=483.612, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 482.13it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=58.702, player_2/loss=512.137, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 482.20it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=53.757, player_2/loss=442.459, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 462.47it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=152.247, player_2/loss=615.935, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 460.19it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=171.979, player_2/loss=596.714, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 419.06it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=131.639, player_2/loss=507.500, rew=5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 410.29it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=48.783, player_2/loss=601.776, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 490.68it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=63.869, player_2/loss=615.247, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 496.29it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=76.324, player_2/loss=529.977, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 483.61it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=60.864, player_2/loss=485.378, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:01, 516.69it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=40.841, player_2/loss=540.638, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 509.16it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=53.606, player_2/loss=535.277, rew=3.57]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 454.65it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=125.042, player_2/loss=196.946, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 527.92it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=134.104, player_2/loss=229.011, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 523.78it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=166.204, player_2/loss=207.788, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 455.30it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=197.627, player_2/loss=161.784, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:01, 524.69it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=185.083, player_2/loss=145.414, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:01, 520.23it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=205.629, player_2/loss=142.601, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:01, 525.79it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=275.734, player_2/loss=119.289, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 489.41it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=233.821, player_2/loss=51.549, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 447.71it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=187.466, player_2/loss=32.608, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:01, 525.65it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=218.235, player_2/loss=52.999, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 479.58it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=223.153, player_2/loss=56.584, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 481.17it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=212.930, player_2/loss=54.825, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.30it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=200.711, player_2/loss=112.164, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 446.09it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=162.503, player_2/loss=105.140, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:01, 521.55it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=238.020, player_2/loss=55.072, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:01, 522.22it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=267.905, player_2/loss=67.963, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:01, 524.68it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=195.744, player_2/loss=98.977, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 495.86it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=165.732, player_2/loss=108.859, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:01, 522.99it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=200.886, player_2/loss=107.590, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:01, 521.74it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=163.985, player_2/loss=183.327, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 498.47it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=95.522, player_2/loss=253.798, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:01, 526.58it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=71.856, player_2/loss=203.596, rew=-15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:01, 519.83it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=98.244, player_2/loss=183.106, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:01, 514.73it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=138.871, player_2/loss=282.813, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 466.27it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=109.877, player_2/loss=454.633, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 493.83it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=63.581, player_2/loss=447.245, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 481.47it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=79.256, player_2/loss=341.161, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:01, 521.67it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=68.117, player_2/loss=321.410, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:01, 526.81it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=40.739, player_2/loss=337.006, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 503.56it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=37.379, player_2/loss=334.381, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:01, 516.59it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=48.134, player_2/loss=272.292, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:01, 524.34it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=85.518, player_2/loss=360.058, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 484.62it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=89.316, player_2/loss=402.237, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:01, 521.29it/s, env_step=15360, len=16, n/ep=3, n/st=64, player_1/loss=59.517, player_2/loss=418.748, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:01, 524.51it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=46.811, player_2/loss=417.760, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:01, 518.76it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=26.731, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:01, 524.14it/s, env_step=18432, len=20, n/ep=4, n/st=64, player_1/loss=57.252, player_2/loss=381.915, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:01, 517.47it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=116.611, player_2/loss=321.138, rew=5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 500.76it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=119.962, player_2/loss=296.944, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 462.41it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=102.983, player_2/loss=207.072, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:01, 524.24it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=92.594, player_2/loss=116.548, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 512.10it/s, env_step=4096, len=25, n/ep=3, n/st=64, player_1/loss=102.664, player_2/loss=117.003, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.47it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=94.199, player_2/loss=104.643, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 432.86it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=81.781, player_2/loss=78.548, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.37it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=139.804, player_2/loss=47.689, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 485.95it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=163.661, player_2/loss=94.109, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 468.62it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=162.639, player_2/loss=88.513, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 458.19it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=204.102, player_2/loss=58.680, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 515.39it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=220.485, player_2/loss=57.749, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:01, 523.16it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=184.462, player_2/loss=30.209, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:01, 520.93it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=188.817, player_2/loss=18.889, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:01, 522.33it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=155.426, player_2/loss=11.835, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:01, 522.78it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=118.297, player_2/loss=13.379, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 508.93it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=124.770, player_2/loss=14.483, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 502.21it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=100.037, player_2/loss=15.006, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:01, 523.00it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=154.000, player_2/loss=38.897, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:01, 519.00it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=180.124, player_2/loss=44.484, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:01, 521.87it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=83.475, player_2/loss=5.996, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:01, 522.78it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=78.218, player_2/loss=6.031, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.98it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=86.576, player_2/loss=34.891, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:01, 524.52it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=111.921, player_2/loss=115.976, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:01, 514.68it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=79.912, player_2/loss=97.864, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 435.64it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=52.955, player_2/loss=16.218, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.49it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=77.653, player_2/loss=34.167, rew=-12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 504.49it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=139.180, player_2/loss=96.525, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:01, 519.35it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=158.286, player_2/loss=192.349, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:01, 518.25it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=143.090, player_2/loss=236.520, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:01, 524.48it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=189.826, player_2/loss=171.116, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 481.95it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=155.573, player_2/loss=119.012, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.75it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=169.979, player_2/loss=113.254, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 485.32it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=305.110, player_2/loss=210.647, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 489.47it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=175.192, player_2/loss=315.745, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 484.91it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=105.136, player_2/loss=290.972, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 458.18it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=76.638, player_2/loss=268.331, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 420.50it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=56.842, player_2/loss=349.450, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 407.20it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=79.723, player_2/loss=309.832, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 482.23it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=61.943, player_2/loss=227.307, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 506.90it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=85.059, rew=-19.44]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 454.66it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=98.824, player_2/loss=213.799, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 477.88it/s, env_step=4096, len=9, n/ep=8, n/st=64, player_1/loss=73.709, player_2/loss=186.686, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 462.98it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=78.879, player_2/loss=207.479, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 502.88it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=179.569, player_2/loss=183.680, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 483.72it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=266.020, player_2/loss=126.267, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 483.09it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=263.160, player_2/loss=32.653, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 500.11it/s, env_step=9216, len=10, n/ep=5, n/st=64, player_1/loss=245.278, player_2/loss=17.807, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:01, 519.79it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=247.298, player_2/loss=26.912, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 456.38it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=196.148, player_2/loss=21.861, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 486.08it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=170.918, player_2/loss=32.983, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 459.24it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=185.397, player_2/loss=35.380, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 468.52it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=192.867, player_2/loss=52.336, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 426.09it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=191.261, player_2/loss=54.788, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 424.42it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=211.885, player_2/loss=22.226, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 492.67it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=227.433, player_2/loss=8.244, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:01, 520.27it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=202.022, player_2/loss=11.118, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:01, 522.58it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=203.973, player_2/loss=67.246, rew=10.71]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:01, 519.20it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=188.071, player_2/loss=67.415, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:01, 521.48it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=150.907, player_2/loss=92.100, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:01, 518.02it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=159.607, player_2/loss=199.458, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 493.44it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=193.458, player_2/loss=218.886, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 389.75it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=147.072, player_2/loss=243.154, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 427.41it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=154.956, player_2/loss=190.397, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 463.17it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=106.557, player_2/loss=212.552, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 445.03it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=36.841, player_2/loss=413.102, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 369.04it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=96.002, player_2/loss=524.768, rew=10.71]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 393.45it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=105.542, player_2/loss=572.576, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 496.57it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=34.280, player_2/loss=590.927, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:01, 520.68it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=60.598, player_2/loss=487.743, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:01, 518.26it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=101.264, player_2/loss=567.645, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 498.49it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=67.557, player_2/loss=481.145, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 501.10it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=28.152, player_2/loss=531.944, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 482.85it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=17.206, player_2/loss=554.544, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 482.80it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=44.815, player_2/loss=601.386, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:01, 520.96it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=81.972, player_2/loss=512.210, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 505.70it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=54.808, player_2/loss=507.776, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:01, 517.69it/s, env_step=1024, len=28, n/ep=2, n/st=64, player_1/loss=100.770, player_2/loss=340.432, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:01, 523.36it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=185.609, player_2/loss=196.390, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 491.03it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=239.047, player_2/loss=44.292, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 464.57it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=222.392, player_2/loss=34.714, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 499.39it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=207.993, player_2/loss=53.893, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.63it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=237.227, player_2/loss=46.031, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 506.94it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=224.116, player_2/loss=34.035, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 477.73it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=188.665, player_2/loss=37.046, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 500.74it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=185.741, player_2/loss=86.608, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 500.99it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=182.932, player_2/loss=79.182, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 383.40it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=221.983, player_2/loss=47.657, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 379.15it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=187.694, player_2/loss=80.703, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 384.42it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=212.337, player_2/loss=82.504, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 395.59it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=240.098, player_2/loss=33.220, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 379.86it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=182.756, player_2/loss=8.252, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 429.37it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=219.535, player_2/loss=52.645, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 427.75it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=246.503, player_2/loss=126.911, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 329.63it/s, env_step=18432, len=16, n/ep=5, n/st=64, player_1/loss=197.777, player_2/loss=106.125, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 320.21it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=148.913, player_2/loss=80.470, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 447.38it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=216.552, player_2/loss=98.548, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 469.54it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=175.187, player_2/loss=70.880, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 474.75it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=127.969, player_2/loss=61.377, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 480.00it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=118.305, player_2/loss=179.271, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 477.00it/s, env_step=5120, len=10, n/ep=7, n/st=64, player_1/loss=142.791, player_2/loss=314.914, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 456.33it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=152.748, player_2/loss=307.606, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 470.85it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=156.815, player_2/loss=200.543, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 476.74it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=125.390, player_2/loss=188.858, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 407.77it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=49.883, player_2/loss=169.114, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 463.55it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=95.330, player_2/loss=190.138, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 457.61it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=102.351, player_2/loss=235.401, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 467.83it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=15.202, player_2/loss=283.770, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 466.84it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=10.569, player_2/loss=253.048, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 484.90it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=12.520, player_2/loss=268.186, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 382.33it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=12.350, player_2/loss=265.824, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 325.90it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=12.726, player_2/loss=247.510, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 354.71it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=10.225, player_2/loss=252.902, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 459.02it/s, env_step=18432, len=10, n/ep=5, n/st=64, player_1/loss=13.137, player_2/loss=256.374, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 488.81it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=29.174, player_2/loss=268.477, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 487.46it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=45.584, player_2/loss=264.713, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 492.99it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=36.524, player_2/loss=223.406, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 487.40it/s, env_step=3072, len=24, n/ep=2, n/st=64, player_1/loss=81.908, player_2/loss=138.046, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 480.37it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=196.463, player_2/loss=100.736, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 413.63it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=178.552, player_2/loss=124.079, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 465.68it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=118.692, player_2/loss=141.231, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 352.78it/s, env_step=7168, len=21, n/ep=2, n/st=64, player_1/loss=180.256, player_2/loss=145.547, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 398.53it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=195.758, player_2/loss=124.742, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 406.61it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=173.169, player_2/loss=165.749, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 431.40it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=87.525, player_2/loss=128.721, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 477.21it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=207.233, player_2/loss=125.276, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 454.40it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=507.207, player_2/loss=156.850, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 478.46it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=567.196, player_2/loss=132.864, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 487.45it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=629.287, player_2/loss=95.748, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 459.16it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=664.941, player_2/loss=108.207, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 414.21it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=584.580, player_2/loss=65.111, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 442.97it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=556.820, rew=25.00]        


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 407.10it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=629.191, player_2/loss=53.866, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 490.59it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=604.102, player_2/loss=14.620, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 479.80it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=461.746, player_2/loss=135.598, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 492.71it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=251.534, player_2/loss=390.377, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 489.17it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=35.541, player_2/loss=624.163, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 490.38it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=41.860, player_2/loss=517.988, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 490.35it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=25.086, player_2/loss=484.907, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 490.10it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=16.295, player_2/loss=540.991, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 473.85it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=50.560, player_2/loss=555.390, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 490.81it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=45.239, player_2/loss=548.019, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 489.51it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=46.911, player_2/loss=497.174, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 488.99it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=30.937, player_2/loss=512.900, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 492.64it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=47.744, player_2/loss=550.104, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 490.38it/s, env_step=12288, len=10, n/ep=5, n/st=64, player_1/loss=55.374, player_2/loss=517.106, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 483.16it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=22.049, player_2/loss=435.272, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 493.69it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=7.424, player_2/loss=418.728, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 491.30it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=40.786, player_2/loss=456.235, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 486.68it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=44.359, player_2/loss=524.847, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 489.63it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=6.238, player_2/loss=471.397, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 490.21it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=5.265, player_2/loss=470.646, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 480.95it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=5.648, player_2/loss=498.772, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 486.70it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=3.054, player_2/loss=483.521, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.37it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=39.944, player_2/loss=266.852, rew=-5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 491.38it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=155.832, player_2/loss=120.735, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 492.34it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=297.705, player_2/loss=101.037, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 495.71it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=250.144, player_2/loss=72.024, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 482.70it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=228.237, player_2/loss=23.551, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.41it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=490.878, player_2/loss=30.320, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 475.19it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=509.328, player_2/loss=30.160, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 415.10it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=289.037, player_2/loss=73.305, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 432.45it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=221.661, player_2/loss=115.067, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 495.86it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=73.828, player_2/loss=150.365, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 484.33it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=79.525, rew=-25.00]       


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 488.11it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=141.412, player_2/loss=154.398, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 490.65it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=162.166, player_2/loss=109.183, rew=-15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 489.76it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=147.533, player_2/loss=93.465, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 488.14it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=143.813, player_2/loss=124.637, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 492.37it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=90.025, player_2/loss=155.742, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 489.42it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=108.076, player_2/loss=164.842, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 487.79it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=160.972, player_2/loss=134.999, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 490.38it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=249.324, player_2/loss=153.043, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.57it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=158.948, player_2/loss=151.084, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.57it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=64.872, player_2/loss=183.930, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.22it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=80.169, player_2/loss=206.502, rew=-5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 478.30it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=72.814, player_2/loss=207.026, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 486.73it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=33.014, player_2/loss=178.400, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.49it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=15.785, player_2/loss=151.222, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 487.86it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=10.466, player_2/loss=190.512, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 487.21it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=8.058, rew=17.86]            


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 489.34it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=17.272, player_2/loss=215.730, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 489.95it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=13.501, player_2/loss=259.055, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 474.99it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=3.657, player_2/loss=286.835, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 485.56it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=8.312, player_2/loss=279.788, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 488.12it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=11.359, player_2/loss=265.898, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 487.62it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=33.926, player_2/loss=253.677, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 485.88it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=34.221, player_2/loss=254.074, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 489.03it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=17.240, player_2/loss=280.150, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 480.93it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=79.945, player_2/loss=262.662, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 494.58it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=94.694, player_2/loss=279.388, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.76it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=83.989, player_2/loss=250.686, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 487.79it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=89.716, player_2/loss=216.198, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.52it/s, env_step=3072, len=9, n/ep=8, n/st=64, player_1/loss=47.803, player_2/loss=169.198, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 482.96it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=12.570, player_2/loss=133.686, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 487.89it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=62.209, player_2/loss=100.389, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 487.16it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=118.675, player_2/loss=131.165, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 483.34it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=246.440, player_2/loss=140.660, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 491.74it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=372.566, rew=25.00]          


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 488.07it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=440.866, player_2/loss=85.721, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 490.87it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=588.595, player_2/loss=77.194, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 483.80it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=580.210, player_2/loss=111.086, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 488.99it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=401.419, player_2/loss=110.278, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 490.42it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=343.292, player_2/loss=154.433, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 491.37it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=498.904, player_2/loss=122.770, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 487.42it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=558.984, player_2/loss=67.235, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 494.00it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=574.074, player_2/loss=60.093, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 465.67it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=504.283, player_2/loss=33.710, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 481.23it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=433.941, player_2/loss=57.402, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 491.08it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=540.760, player_2/loss=74.249, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 487.02it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=277.059, player_2/loss=27.844, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.60it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=273.599, player_2/loss=76.837, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.59it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=188.100, player_2/loss=105.619, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 484.19it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=183.930, player_2/loss=152.241, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 494.11it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=175.710, player_2/loss=239.917, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 491.75it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=119.009, player_2/loss=235.118, rew=-16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 490.83it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=63.424, player_2/loss=298.833, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 492.68it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=21.674, player_2/loss=264.528, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 494.50it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=52.906, player_2/loss=226.837, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 483.94it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=49.747, player_2/loss=243.464, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 495.91it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=11.864, player_2/loss=267.244, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 490.63it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=13.773, player_2/loss=241.579, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 490.64it/s, env_step=13312, len=9, n/ep=6, n/st=64, player_1/loss=21.271, player_2/loss=220.309, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 490.24it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=83.667, player_2/loss=347.979, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 474.70it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=75.402, player_2/loss=428.725, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 476.32it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=59.483, player_2/loss=409.095, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 485.56it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=74.956, player_2/loss=400.384, rew=13.89]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 489.76it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=35.469, player_2/loss=399.635, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 486.13it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=24.599, player_2/loss=376.190, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 480.63it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=44.575, player_2/loss=337.090, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.25it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=43.413, rew=-19.44]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 486.82it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=55.978, player_2/loss=211.367, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 495.16it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=60.730, player_2/loss=203.269, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 496.38it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=87.921, player_2/loss=208.290, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 495.43it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=182.057, player_2/loss=135.882, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 491.40it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=165.447, player_2/loss=67.384, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.43it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=167.063, player_2/loss=70.985, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 496.40it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=287.212, player_2/loss=87.801, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 491.74it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=336.626, player_2/loss=49.920, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 494.90it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=243.366, player_2/loss=8.599, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 493.01it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=203.301, player_2/loss=46.076, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.10it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=215.981, player_2/loss=76.003, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 479.47it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=209.975, player_2/loss=44.910, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 493.52it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=228.626, player_2/loss=33.772, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 492.66it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=317.660, player_2/loss=47.947, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 492.12it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=275.419, player_2/loss=42.090, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 486.69it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=230.841, player_2/loss=30.493, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 499.29it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=346.460, player_2/loss=22.450, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 480.14it/s, env_step=1024, len=18, n/ep=5, n/st=64, player_1/loss=151.349, player_2/loss=169.302, rew=40.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.99it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=122.423, player_2/loss=196.283, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.83it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=71.947, player_2/loss=253.745, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.63it/s, env_step=4096, len=9, n/ep=8, n/st=64, player_1/loss=50.511, player_2/loss=233.838, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 486.05it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=191.791, player_2/loss=188.672, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 485.59it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=232.945, player_2/loss=188.373, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 478.78it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=112.044, player_2/loss=206.372, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.95it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=82.858, player_2/loss=197.050, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 488.57it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=142.090, player_2/loss=223.754, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 487.65it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=172.779, player_2/loss=265.081, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.76it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=157.865, player_2/loss=421.921, rew=19.44]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 486.79it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=189.221, player_2/loss=552.787, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 481.24it/s, env_step=13312, len=10, n/ep=5, n/st=64, player_1/loss=173.611, player_2/loss=523.350, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 469.65it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=128.403, player_2/loss=512.937, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 471.27it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=79.951, player_2/loss=555.943, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 484.89it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=68.057, player_2/loss=550.441, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 484.63it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=24.110, player_2/loss=514.533, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 489.54it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=56.457, rew=13.89]         


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 475.40it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=73.895, player_2/loss=475.085, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.43it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=80.171, player_2/loss=327.141, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.87it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=139.769, player_2/loss=299.447, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.73it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=147.291, player_2/loss=217.821, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 494.95it/s, env_step=4096, len=26, n/ep=3, n/st=64, player_1/loss=81.684, player_2/loss=152.933, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 480.79it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=157.182, player_2/loss=128.982, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 491.87it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=236.106, player_2/loss=91.698, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 489.47it/s, env_step=7168, len=16, n/ep=3, n/st=64, player_1/loss=298.060, player_2/loss=78.044, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 493.63it/s, env_step=8192, len=26, n/ep=2, n/st=64, player_1/loss=248.826, player_2/loss=99.750, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 492.60it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=173.350, player_2/loss=39.458, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 497.07it/s, env_step=10240, len=24, n/ep=2, n/st=64, player_1/loss=221.761, player_2/loss=30.254, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 482.05it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=286.498, player_2/loss=44.798, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 490.16it/s, env_step=12288, len=30, n/ep=3, n/st=64, player_1/loss=268.358, player_2/loss=38.030, rew=50.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 495.90it/s, env_step=13312, len=25, n/ep=3, n/st=64, player_1/loss=293.843, player_2/loss=28.913, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 494.65it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=250.785, player_2/loss=51.372, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 482.87it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=192.363, player_2/loss=73.612, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 488.60it/s, env_step=16384, len=25, n/ep=2, n/st=64, player_1/loss=144.912, player_2/loss=100.250, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 478.30it/s, env_step=17408, len=21, n/ep=4, n/st=64, player_1/loss=218.678, player_2/loss=102.653, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 493.32it/s, env_step=18432, len=24, n/ep=3, n/st=64, player_1/loss=280.515, player_2/loss=53.167, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 492.97it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=307.179, player_2/loss=34.580, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 489.73it/s, env_step=1024, len=26, n/ep=2, n/st=64, player_1/loss=289.965, player_2/loss=34.391, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.14it/s, env_step=2048, len=25, n/ep=3, n/st=64, player_1/loss=200.717, player_2/loss=25.535, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.65it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=115.781, player_2/loss=17.564, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 477.19it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=85.042, rew=0.00]           


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.15it/s, env_step=5120, len=31, n/ep=3, n/st=64, player_1/loss=107.827, player_2/loss=66.468, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 491.31it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=108.248, player_2/loss=108.720, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 491.40it/s, env_step=7168, len=31, n/ep=2, n/st=64, player_1/loss=113.221, player_2/loss=130.735, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 491.67it/s, env_step=8192, len=27, n/ep=3, n/st=64, player_1/loss=84.089, player_2/loss=112.111, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 490.83it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=82.430, player_2/loss=139.583, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 484.16it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=105.850, player_2/loss=150.670, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 492.43it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=123.321, player_2/loss=131.184, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 490.69it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=109.854, player_2/loss=158.307, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 492.08it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=90.853, player_2/loss=121.296, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 489.50it/s, env_step=14336, len=15, n/ep=5, n/st=64, player_1/loss=120.908, player_2/loss=150.310, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 483.06it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=153.255, player_2/loss=190.693, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 480.79it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=100.872, player_2/loss=168.435, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 490.23it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=44.573, player_2/loss=121.059, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 498.70it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=59.775, player_2/loss=117.414, rew=-12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 494.83it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=62.564, player_2/loss=143.734, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 483.65it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=202.120, player_2/loss=185.085, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 488.40it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=249.641, player_2/loss=167.417, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 480.80it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=229.447, player_2/loss=142.388, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 487.48it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=199.100, player_2/loss=91.547, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 491.11it/s, env_step=5120, len=9, n/ep=8, n/st=64, player_1/loss=233.250, player_2/loss=43.239, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 486.32it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_2/loss=37.866, rew=16.67]           


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 489.89it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=327.679, player_2/loss=34.033, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 466.45it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=358.505, player_2/loss=66.592, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 463.61it/s, env_step=9216, len=9, n/ep=8, n/st=64, player_1/loss=328.053, player_2/loss=72.636, rew=6.25]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 454.95it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_2/loss=51.830, rew=18.75]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 461.52it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=273.771, player_2/loss=98.845, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 461.51it/s, env_step=12288, len=9, n/ep=8, n/st=64, player_1/loss=229.792, player_2/loss=97.411, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 473.23it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=256.846, player_2/loss=57.131, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 463.87it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=314.203, player_2/loss=49.237, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 445.95it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=265.626, player_2/loss=59.771, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 471.65it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=284.019, player_2/loss=58.748, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 466.14it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=305.759, player_2/loss=64.918, rew=18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 485.84it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=353.950, player_2/loss=54.139, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 419.31it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=370.300, player_2/loss=30.852, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 399.13it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=165.359, player_2/loss=96.593, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 368.18it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=110.280, player_2/loss=182.252, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 372.08it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=68.804, player_2/loss=268.070, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 484.79it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=93.036, player_2/loss=273.965, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 470.34it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=67.810, player_2/loss=245.470, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 432.66it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=52.467, player_2/loss=246.771, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 433.06it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=44.170, player_2/loss=294.035, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 400.60it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=22.380, player_2/loss=287.297, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 478.68it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=18.825, player_2/loss=295.340, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 485.25it/s, env_step=10240, len=8, n/ep=5, n/st=64, player_1/loss=17.370, player_2/loss=265.405, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 493.66it/s, env_step=11264, len=9, n/ep=6, n/st=64, player_1/loss=71.137, player_2/loss=272.302, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 489.18it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=76.306, player_2/loss=284.482, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 456.05it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=20.705, player_2/loss=315.088, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 490.29it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=6.273, player_2/loss=313.239, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 492.00it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=4.839, player_2/loss=299.406, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.63it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=5.292, player_2/loss=283.590, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 492.17it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=63.191, player_2/loss=274.010, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 491.67it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=71.406, player_2/loss=281.580, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 483.24it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=56.374, player_2/loss=308.806, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 484.61it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=53.889, player_2/loss=202.606, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.22it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=43.804, player_2/loss=175.754, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.50it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=31.460, player_2/loss=119.062, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 481.43it/s, env_step=4096, len=9, n/ep=8, n/st=64, player_1/loss=29.038, player_2/loss=83.194, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 489.75it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=138.534, player_2/loss=134.253, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 491.64it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=274.412, player_2/loss=175.538, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 483.25it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=353.609, player_2/loss=94.428, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 491.00it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=428.842, player_2/loss=48.870, rew=18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 487.56it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=373.677, player_2/loss=83.140, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 488.11it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=341.018, player_2/loss=72.708, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 491.44it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=305.656, player_2/loss=95.226, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 483.52it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=350.780, player_2/loss=140.342, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 486.21it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=347.236, player_2/loss=92.532, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 493.19it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=254.547, player_2/loss=62.086, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 495.02it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=245.004, player_2/loss=61.048, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 489.96it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=329.652, player_2/loss=35.989, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 487.71it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=404.066, player_2/loss=20.689, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 481.70it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=417.335, player_2/loss=42.933, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 489.82it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=361.125, player_2/loss=67.022, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 486.00it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=325.944, player_2/loss=58.154, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.18it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=308.660, player_2/loss=66.452, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 487.60it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=195.796, player_2/loss=227.985, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 489.62it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=89.230, player_2/loss=568.221, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 479.22it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=64.785, player_2/loss=645.521, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 489.04it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=34.431, player_2/loss=767.664, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 487.18it/s, env_step=7168, len=9, n/ep=8, n/st=64, player_1/loss=13.440, player_2/loss=844.136, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 485.81it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=29.181, player_2/loss=666.887, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 491.08it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=73.580, player_2/loss=518.800, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 488.29it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=66.599, player_2/loss=498.392, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 489.71it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=86.891, player_2/loss=446.941, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 480.76it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=51.018, player_2/loss=562.216, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.42it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=37.240, player_2/loss=627.520, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 487.67it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=59.174, player_2/loss=668.552, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 488.56it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=61.917, rew=17.86]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 488.06it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=46.565, player_2/loss=737.876, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 489.99it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=18.784, player_2/loss=681.891, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 484.16it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=31.468, player_2/loss=726.539, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 491.37it/s, env_step=19456, len=9, n/ep=5, n/st=64, player_1/loss=23.056, player_2/loss=838.368, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 486.47it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=5.608, player_2/loss=512.842, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.64it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=7.000, player_2/loss=409.510, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.54it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=8.615, player_2/loss=318.963, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.89it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=14.459, player_2/loss=278.983, rew=-16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 482.94it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=16.604, player_2/loss=250.603, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 490.95it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=20.987, player_2/loss=245.530, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 489.12it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=18.536, player_2/loss=232.312, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 494.30it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=5.880, player_2/loss=247.935, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 493.99it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=106.340, player_2/loss=167.896, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 492.67it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=205.543, player_2/loss=76.474, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 490.05it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=286.632, player_2/loss=57.290, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 494.49it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=180.925, player_2/loss=110.325, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 494.09it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=97.018, player_2/loss=146.267, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 495.44it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=80.024, player_2/loss=92.385, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 490.00it/s, env_step=15360, len=26, n/ep=2, n/st=64, player_1/loss=106.834, player_2/loss=78.135, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 492.39it/s, env_step=16384, len=26, n/ep=3, n/st=64, player_1/loss=205.786, player_2/loss=59.784, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 487.40it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=237.395, player_2/loss=34.385, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 492.75it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=188.596, player_2/loss=34.374, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 496.13it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=157.169, player_2/loss=34.095, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 491.82it/s, env_step=1024, len=24, n/ep=2, n/st=64, player_1/loss=109.009, player_2/loss=279.917, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 495.20it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=93.957, player_2/loss=175.179, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 493.48it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=67.177, player_2/loss=126.703, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 490.68it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=59.082, player_2/loss=282.258, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 484.70it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=83.837, player_2/loss=267.184, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 485.70it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=192.975, player_2/loss=230.313, rew=-17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 491.10it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=248.633, player_2/loss=328.212, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 489.09it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=196.327, player_2/loss=361.217, rew=-17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 490.07it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=186.741, player_2/loss=276.055, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 489.27it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=186.670, player_2/loss=219.174, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 480.29it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=101.433, rew=12.50]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.87it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=50.129, player_2/loss=294.142, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 490.21it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=64.782, player_2/loss=299.379, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 490.05it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=48.166, player_2/loss=271.482, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 490.46it/s, env_step=15360, len=8, n/ep=9, n/st=64, player_1/loss=15.327, rew=19.44]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.59it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=19.112, player_2/loss=304.562, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 481.91it/s, env_step=17408, len=7, n/ep=7, n/st=64, player_1/loss=58.891, player_2/loss=330.218, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 489.12it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=56.634, player_2/loss=319.043, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 493.01it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=22.742, player_2/loss=309.659, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 488.77it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=96.816, player_2/loss=196.414, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 495.04it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=113.841, player_2/loss=188.858, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.02it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=128.139, player_2/loss=197.474, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 488.03it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=144.349, player_2/loss=211.975, rew=-13.89]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 490.21it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=113.869, player_2/loss=189.306, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.83it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=215.011, player_2/loss=124.405, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 493.07it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=336.705, player_2/loss=105.362, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 490.93it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=423.972, player_2/loss=99.509, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 496.39it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=213.677, player_2/loss=106.722, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 493.05it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=33.065, player_2/loss=142.277, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 486.21it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=56.058, player_2/loss=118.136, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 492.96it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=80.350, rew=-19.44]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 495.57it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=154.389, player_2/loss=121.275, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 495.39it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=244.320, player_2/loss=147.962, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 490.33it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=236.058, player_2/loss=113.401, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 484.63it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=317.723, player_2/loss=47.926, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 495.15it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=280.849, player_2/loss=28.316, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 495.62it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=228.612, player_2/loss=32.318, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 493.45it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=202.302, player_2/loss=38.097, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 488.31it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=202.193, player_2/loss=90.733, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.46it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=192.128, player_2/loss=43.311, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 485.34it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=150.346, player_2/loss=44.457, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 483.93it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=118.404, player_2/loss=89.100, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.83it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=147.854, player_2/loss=95.762, rew=-5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.29it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=148.839, player_2/loss=113.953, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 490.92it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=174.431, player_2/loss=108.560, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.95it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=157.490, player_2/loss=223.840, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 481.66it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=103.933, player_2/loss=356.867, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 483.07it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=102.563, player_2/loss=433.347, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 483.04it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=50.955, player_2/loss=337.576, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 491.35it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=79.187, player_2/loss=317.648, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 487.95it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=83.037, rew=18.75]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.09it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=36.786, player_2/loss=326.612, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 488.95it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=22.848, player_2/loss=298.090, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 483.02it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=5.475, player_2/loss=272.702, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 485.81it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=4.506, player_2/loss=257.989, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 487.46it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=6.467, player_2/loss=304.952, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 487.60it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=7.621, rew=25.00]          


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 487.61it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=11.253, player_2/loss=215.995, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 481.93it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=48.153, player_2/loss=209.900, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.18it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=82.670, player_2/loss=178.552, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.31it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=64.329, player_2/loss=139.402, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 495.55it/s, env_step=5120, len=32, n/ep=2, n/st=64, player_1/loss=125.199, player_2/loss=100.823, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 491.77it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=139.352, player_2/loss=110.204, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 493.75it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=120.723, player_2/loss=107.834, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 494.81it/s, env_step=8192, len=26, n/ep=2, n/st=64, player_1/loss=61.054, player_2/loss=83.144, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 489.06it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=131.208, player_2/loss=69.895, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 496.24it/s, env_step=10240, len=20, n/ep=4, n/st=64, player_1/loss=323.916, player_2/loss=56.554, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 491.64it/s, env_step=11264, len=23, n/ep=2, n/st=64, player_1/loss=334.492, player_2/loss=59.486, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 493.02it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=148.637, player_2/loss=76.631, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 494.61it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=129.923, player_2/loss=70.928, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 483.93it/s, env_step=14336, len=30, n/ep=2, n/st=64, player_1/loss=97.010, player_2/loss=72.954, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 496.45it/s, env_step=15360, len=32, n/ep=2, n/st=64, player_1/loss=160.194, player_2/loss=148.855, rew=37.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 493.40it/s, env_step=16384, len=27, n/ep=3, n/st=64, player_1/loss=212.061, rew=-25.00]      


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 494.21it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=514.111, player_2/loss=193.292, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 492.00it/s, env_step=18432, len=19, n/ep=2, n/st=64, player_1/loss=493.263, player_2/loss=93.963, rew=0.00]


Epoch #18: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #18


Epoch #19: 1025it [00:02, 491.97it/s, env_step=19456, len=28, n/ep=2, n/st=64, player_1/loss=409.162, player_2/loss=133.860, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #18


Epoch #1: 1025it [00:02, 488.55it/s, env_step=1024, len=18, n/ep=2, n/st=64, player_1/loss=283.755, player_2/loss=107.027, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 481.74it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=193.181, player_2/loss=102.390, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 494.12it/s, env_step=3072, len=16, n/ep=5, n/st=64, player_1/loss=85.806, player_2/loss=107.394, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 491.33it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=67.730, player_2/loss=128.029, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 495.62it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=69.380, player_2/loss=100.564, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 491.34it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=40.696, rew=25.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 486.26it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=39.054, player_2/loss=85.668, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 484.14it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=38.839, player_2/loss=111.331, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.33it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=57.183, player_2/loss=105.196, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 489.97it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=39.793, player_2/loss=104.095, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 492.05it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=23.173, player_2/loss=88.392, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 470.21it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=19.396, player_2/loss=96.916, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 490.41it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=15.941, player_2/loss=77.469, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 482.14it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=22.515, player_2/loss=84.376, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.73it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=26.444, player_2/loss=85.343, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 489.50it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=47.389, player_2/loss=94.555, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 487.04it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=80.149, player_2/loss=111.126, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 494.20it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=53.043, player_2/loss=110.083, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 489.37it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=41.508, player_2/loss=114.835, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 475.63it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=20.150, player_2/loss=56.265, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.17it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=63.064, player_2/loss=100.834, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.58it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=123.696, player_2/loss=119.135, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 491.17it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=137.547, player_2/loss=112.603, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 491.15it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=179.791, player_2/loss=142.748, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 493.14it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=196.507, rew=-25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 485.33it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=189.554, player_2/loss=190.087, rew=-18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 492.11it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=175.700, player_2/loss=173.312, rew=-19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 492.34it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=160.215, player_2/loss=160.034, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 492.67it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=34.131, player_2/loss=143.475, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 491.88it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=113.370, player_2/loss=111.529, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 491.16it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=175.372, player_2/loss=161.331, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 481.26it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=225.468, player_2/loss=169.412, rew=-17.86]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 493.78it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=261.973, player_2/loss=143.439, rew=-17.86]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 493.70it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=200.304, player_2/loss=134.317, rew=-15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 492.65it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=195.910, player_2/loss=137.609, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 492.06it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=255.764, player_2/loss=128.528, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 489.42it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=376.481, player_2/loss=93.487, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 486.56it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=318.404, player_2/loss=98.641, rew=-15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 488.35it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=403.946, player_2/loss=77.973, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 491.07it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=239.829, player_2/loss=118.858, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.87it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=65.958, player_2/loss=153.044, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 496.77it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=22.635, player_2/loss=149.039, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.36it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=59.159, player_2/loss=149.098, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 484.17it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=118.631, player_2/loss=139.194, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 490.63it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=90.657, player_2/loss=136.408, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.97it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=120.596, player_2/loss=191.394, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.57it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=78.345, player_2/loss=198.697, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 490.59it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=35.216, player_2/loss=200.539, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.14it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=33.854, player_2/loss=211.397, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 480.13it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=18.102, player_2/loss=242.157, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 488.12it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=28.921, player_2/loss=216.105, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 492.81it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=36.539, player_2/loss=181.813, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 490.54it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=19.856, player_2/loss=230.075, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 491.64it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=10.714, player_2/loss=250.242, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 483.29it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=16.117, player_2/loss=209.751, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 483.73it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=16.802, player_2/loss=183.828, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 489.27it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=28.811, player_2/loss=156.032, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 489.99it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=9.938, player_2/loss=157.329, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.35it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=66.036, player_2/loss=192.084, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 490.92it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=213.788, player_2/loss=172.510, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 487.90it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=351.585, player_2/loss=102.068, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 489.04it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=344.212, player_2/loss=76.244, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 491.86it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=379.991, player_2/loss=82.328, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 493.95it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=381.115, player_2/loss=67.186, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 489.64it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=349.143, player_2/loss=48.991, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 490.17it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=317.192, player_2/loss=55.441, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 489.74it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=330.616, player_2/loss=35.719, rew=17.86]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 480.85it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=404.990, player_2/loss=4.365, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 490.64it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=371.217, player_2/loss=25.215, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 491.39it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=322.290, player_2/loss=54.464, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 490.09it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=395.571, player_2/loss=42.534, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 490.96it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=384.311, player_2/loss=39.919, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 483.26it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=331.590, player_2/loss=47.279, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 479.41it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=307.977, player_2/loss=47.987, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 487.68it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=366.701, player_2/loss=79.039, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 488.84it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=383.366, player_2/loss=99.716, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 485.09it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=265.280, player_2/loss=60.282, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.71it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=237.346, player_2/loss=50.802, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 487.24it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=213.635, player_2/loss=71.323, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 488.82it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=257.704, player_2/loss=257.637, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 490.23it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=258.466, player_2/loss=270.482, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 491.91it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_2/loss=123.923, rew=-25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 490.72it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=227.386, player_2/loss=72.582, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 489.42it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=284.048, player_2/loss=312.213, rew=18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 483.00it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=365.774, player_2/loss=584.052, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 490.22it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=313.750, player_2/loss=351.915, rew=-18.75]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 489.67it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=240.578, player_2/loss=83.219, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 491.35it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=145.162, player_2/loss=41.709, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 490.69it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=108.863, player_2/loss=47.113, rew=-17.86]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 492.56it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=128.931, player_2/loss=194.945, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 477.39it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=204.605, player_2/loss=419.476, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 486.50it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=137.597, player_2/loss=559.361, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 488.31it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=131.246, player_2/loss=588.771, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 492.43it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=158.454, player_2/loss=684.972, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 491.60it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=133.910, player_2/loss=696.937, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 487.42it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=125.659, player_2/loss=447.637, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 481.32it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=114.776, player_2/loss=383.529, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.15it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=101.305, player_2/loss=300.759, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 491.57it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=97.529, player_2/loss=221.303, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 490.82it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=118.603, player_2/loss=222.309, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 492.28it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=134.934, player_2/loss=248.595, rew=-3.57]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 490.55it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=204.723, player_2/loss=187.696, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 486.46it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=312.560, player_2/loss=72.208, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 490.23it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=234.321, player_2/loss=32.333, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 492.59it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=194.752, player_2/loss=34.133, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 490.69it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=211.045, player_2/loss=28.784, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 491.69it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=200.555, player_2/loss=25.155, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 491.53it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=218.701, player_2/loss=11.916, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 484.00it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=276.794, player_2/loss=8.861, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 492.73it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=276.919, player_2/loss=27.100, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 492.95it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=208.579, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 493.23it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=188.492, player_2/loss=63.745, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 494.43it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=220.619, player_2/loss=78.315, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 484.61it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=249.722, player_2/loss=61.566, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 489.99it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=182.310, player_2/loss=284.990, rew=-5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 488.43it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=157.317, player_2/loss=397.838, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 490.87it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=133.635, player_2/loss=363.594, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 491.30it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=122.269, player_2/loss=268.639, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 492.06it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=121.002, player_2/loss=326.760, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 481.00it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=119.833, player_2/loss=428.900, rew=15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 487.02it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=115.969, player_2/loss=385.309, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 491.29it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=86.494, player_2/loss=361.687, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 492.35it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_2/loss=392.122, rew=12.50]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 494.05it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_2/loss=448.565, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 488.69it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=87.415, player_2/loss=510.773, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 491.41it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=81.194, player_2/loss=305.695, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 479.36it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=102.940, player_2/loss=301.296, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 494.44it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=79.358, player_2/loss=371.584, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 490.66it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=70.870, player_2/loss=253.340, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 488.87it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=59.457, player_2/loss=288.238, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.34it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=68.554, player_2/loss=486.866, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 490.70it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=47.615, player_2/loss=560.291, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 479.98it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=18.507, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 487.62it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=101.790, player_2/loss=511.255, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 494.46it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=50.095, player_2/loss=378.059, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.81it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=34.474, player_2/loss=275.660, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.51it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=28.361, player_2/loss=205.686, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 484.78it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=45.119, player_2/loss=172.125, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 491.09it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=126.666, player_2/loss=156.041, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 488.44it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=194.321, player_2/loss=174.932, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 492.76it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=228.194, player_2/loss=91.406, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 493.43it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=233.323, player_2/loss=88.978, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 493.53it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=245.378, player_2/loss=127.483, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 478.49it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=249.241, player_2/loss=141.109, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 484.39it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=256.066, player_2/loss=93.479, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 493.75it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=277.588, player_2/loss=77.433, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 493.29it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=272.758, player_2/loss=76.852, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 491.88it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=294.381, player_2/loss=51.217, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 485.58it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=251.368, player_2/loss=108.283, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 485.42it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=208.335, player_2/loss=112.764, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 492.84it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=225.300, player_2/loss=113.978, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 488.01it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=261.788, player_2/loss=38.374, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 485.35it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=179.331, player_2/loss=460.731, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 467.97it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=145.023, player_2/loss=501.295, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 481.68it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=83.981, player_2/loss=550.256, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.69it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=91.186, player_2/loss=557.808, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 489.19it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=123.751, player_2/loss=593.740, rew=13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 484.26it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=42.385, player_2/loss=605.302, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 486.02it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=101.486, player_2/loss=549.702, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.66it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=62.403, player_2/loss=555.911, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 484.21it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=94.331, player_2/loss=647.967, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 491.77it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=121.237, player_2/loss=646.389, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.67it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=70.841, player_2/loss=541.790, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 490.68it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=30.140, player_2/loss=545.831, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.48it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=49.268, player_2/loss=562.369, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 479.44it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=71.836, player_2/loss=547.080, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 488.72it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=20.045, player_2/loss=552.983, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 487.38it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=45.963, player_2/loss=664.742, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 487.53it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=83.785, player_2/loss=510.326, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 490.45it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=80.144, player_2/loss=460.506, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 490.89it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=71.135, player_2/loss=385.942, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 474.94it/s, env_step=1024, len=9, n/ep=8, n/st=64, player_1/loss=31.624, player_2/loss=399.964, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.18it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=49.426, player_2/loss=335.333, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.52it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=88.827, player_2/loss=234.153, rew=-10.71]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.06it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_1/loss=81.446, player_2/loss=158.067, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.15it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=85.911, player_2/loss=138.546, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 489.27it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=102.880, player_2/loss=177.355, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 483.03it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=133.444, player_2/loss=155.626, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 490.69it/s, env_step=8192, len=23, n/ep=2, n/st=64, player_1/loss=131.954, player_2/loss=110.225, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 492.89it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=152.055, player_2/loss=88.770, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 488.06it/s, env_step=10240, len=25, n/ep=3, n/st=64, player_1/loss=183.267, player_2/loss=63.522, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 492.00it/s, env_step=11264, len=27, n/ep=2, n/st=64, player_1/loss=149.593, player_2/loss=90.854, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 489.01it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=121.811, player_2/loss=88.196, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 493.33it/s, env_step=13312, len=24, n/ep=3, n/st=64, player_1/loss=131.962, player_2/loss=99.436, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 481.18it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=141.981, player_2/loss=80.104, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 492.86it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=250.091, player_2/loss=48.298, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 487.24it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=334.083, player_2/loss=37.591, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 490.23it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=387.693, player_2/loss=26.472, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 490.90it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=352.995, player_2/loss=61.369, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 493.52it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=241.687, player_2/loss=75.364, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 481.19it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=113.227, player_2/loss=84.493, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.67it/s, env_step=2048, len=21, n/ep=4, n/st=64, player_1/loss=68.152, player_2/loss=274.540, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 492.66it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=43.575, player_2/loss=495.458, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 493.88it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=48.950, player_2/loss=636.896, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 492.76it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=23.686, player_2/loss=540.224, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 495.70it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=71.055, player_2/loss=301.958, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 480.98it/s, env_step=7168, len=16, n/ep=5, n/st=64, player_1/loss=84.303, player_2/loss=427.129, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 486.37it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=26.239, player_2/loss=533.693, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.51it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=31.013, player_2/loss=542.040, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 492.99it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=48.568, player_2/loss=385.194, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 489.63it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=38.798, player_2/loss=287.737, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 493.27it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=54.930, player_2/loss=239.960, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 484.20it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=63.793, player_2/loss=192.794, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 494.29it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=57.213, player_2/loss=169.416, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 492.35it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=45.864, player_2/loss=228.629, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 489.71it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=31.388, player_2/loss=299.342, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 492.34it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=43.180, player_2/loss=300.243, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 493.77it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=34.650, player_2/loss=233.670, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 455.34it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=23.844, player_2/loss=275.392, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 485.23it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=41.555, player_2/loss=268.114, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 481.10it/s, env_step=2048, len=22, n/ep=2, n/st=64, player_1/loss=74.586, player_2/loss=220.730, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 493.75it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=111.824, player_2/loss=192.306, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 491.74it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=105.407, player_2/loss=100.756, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 493.53it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=88.799, player_2/loss=81.407, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 483.50it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=92.958, player_2/loss=111.246, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 483.91it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=79.772, player_2/loss=162.894, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 491.84it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=117.619, player_2/loss=166.753, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 492.12it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=164.512, player_2/loss=134.796, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 493.18it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=96.961, player_2/loss=83.746, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 493.00it/s, env_step=11264, len=12, n/ep=4, n/st=64, player_1/loss=117.817, player_2/loss=82.358, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 493.26it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=262.337, player_2/loss=108.971, rew=-5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 490.02it/s, env_step=13312, len=11, n/ep=4, n/st=64, player_1/loss=279.615, player_2/loss=82.160, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 494.02it/s, env_step=14336, len=10, n/ep=5, n/st=64, player_1/loss=324.302, player_2/loss=50.545, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 496.77it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=297.059, player_2/loss=55.732, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 494.47it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=185.512, player_2/loss=66.441, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 494.22it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=191.769, player_2/loss=60.925, rew=15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 493.51it/s, env_step=18432, len=12, n/ep=4, n/st=64, player_1/loss=226.004, player_2/loss=62.910, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 479.45it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=228.214, player_2/loss=78.703, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 491.15it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=231.599, player_2/loss=109.013, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.64it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=158.458, player_2/loss=224.552, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.85it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=104.795, player_2/loss=293.513, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 492.06it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=65.842, player_2/loss=230.103, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 489.48it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=31.796, player_2/loss=268.358, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 480.06it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=75.184, player_2/loss=266.733, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 486.26it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=106.063, player_2/loss=187.602, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 490.88it/s, env_step=8192, len=13, n/ep=6, n/st=64, player_1/loss=104.792, player_2/loss=140.945, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 488.01it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=93.002, player_2/loss=268.056, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 489.70it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=138.566, player_2/loss=350.583, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 488.34it/s, env_step=11264, len=9, n/ep=8, n/st=64, player_1/loss=87.844, player_2/loss=337.095, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 483.46it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=58.819, player_2/loss=295.771, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.00it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=69.108, player_2/loss=296.245, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 493.83it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=43.850, player_2/loss=313.054, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 488.95it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=24.913, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 488.75it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=45.651, player_2/loss=342.523, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.71it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=74.257, player_2/loss=268.089, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 480.97it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=75.713, player_2/loss=252.333, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.29it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=16.322, player_2/loss=318.975, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 491.04it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=103.032, player_2/loss=205.549, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.46it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=152.515, player_2/loss=212.386, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 492.62it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=179.668, player_2/loss=149.121, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 484.43it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=179.920, player_2/loss=78.314, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 488.58it/s, env_step=5120, len=12, n/ep=4, n/st=64, player_1/loss=224.046, player_2/loss=59.812, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 491.74it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=198.269, player_2/loss=49.910, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 491.36it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=206.313, player_2/loss=12.507, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 493.82it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=241.349, player_2/loss=30.520, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 493.07it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=251.411, player_2/loss=29.834, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 495.08it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=206.287, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 475.86it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=215.633, player_2/loss=9.414, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 491.15it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=231.919, player_2/loss=48.511, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 492.06it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=206.691, rew=25.00]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 491.15it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=188.613, player_2/loss=5.185, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.25it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=209.193, player_2/loss=6.322, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 488.61it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=238.857, player_2/loss=7.705, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 483.88it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=210.782, player_2/loss=24.898, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 494.19it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=208.444, player_2/loss=37.804, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 496.89it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=300.607, player_2/loss=57.522, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 490.67it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=159.335, player_2/loss=4.895, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 474.52it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=127.192, player_2/loss=55.147, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.72it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=148.355, player_2/loss=173.863, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 473.26it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=103.685, rew=18.75]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 483.42it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=66.039, player_2/loss=500.638, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 489.51it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=87.091, player_2/loss=508.607, rew=2.78]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 475.54it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=99.134, player_2/loss=466.034, rew=17.86]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 488.07it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=59.403, player_2/loss=508.419, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 488.23it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=94.258, player_2/loss=597.865, rew=6.25]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 472.00it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=144.516, player_2/loss=607.069, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 484.83it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=148.063, player_2/loss=519.125, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 492.32it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=89.860, player_2/loss=378.893, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.42it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=45.361, player_2/loss=432.262, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 484.64it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=76.856, player_2/loss=462.738, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 486.74it/s, env_step=15360, len=7, n/ep=7, n/st=64, player_1/loss=46.772, player_2/loss=471.936, rew=-3.57]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 477.30it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=54.539, player_2/loss=422.956, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 489.05it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=94.047, player_2/loss=464.961, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 488.13it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=77.164, player_2/loss=453.010, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 485.78it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=53.275, player_2/loss=464.530, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 485.10it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=264.849, player_2/loss=95.212, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 490.52it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=237.306, player_2/loss=123.051, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 477.79it/s, env_step=3072, len=12, n/ep=4, n/st=64, player_1/loss=226.366, player_2/loss=109.507, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.48it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=210.276, player_2/loss=36.940, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.04it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=236.110, player_2/loss=58.456, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 493.61it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=216.745, player_2/loss=58.510, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 491.46it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=182.661, player_2/loss=14.143, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 488.97it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=227.472, player_2/loss=7.479, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 487.19it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=308.806, player_2/loss=7.907, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 457.54it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=267.585, player_2/loss=11.740, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 485.52it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=217.241, player_2/loss=11.293, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 489.05it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=244.014, player_2/loss=67.602, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 493.01it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=213.706, player_2/loss=79.571, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 488.41it/s, env_step=14336, len=11, n/ep=7, n/st=64, player_1/loss=264.140, player_2/loss=22.421, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 462.11it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=268.918, player_2/loss=24.269, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 489.64it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=232.903, player_2/loss=26.226, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 493.25it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=306.960, player_2/loss=91.091, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 491.29it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=299.110, player_2/loss=226.718, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 488.67it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=332.918, player_2/loss=154.838, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 486.45it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=212.010, player_2/loss=66.386, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 475.79it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=182.356, player_2/loss=62.607, rew=-10.71]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 485.06it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=115.980, player_2/loss=45.854, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.19it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=253.722, player_2/loss=307.073, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 487.40it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=339.420, player_2/loss=489.955, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 485.89it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=171.600, player_2/loss=698.129, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 483.95it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=116.440, player_2/loss=648.380, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 478.73it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=95.568, player_2/loss=705.447, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 484.21it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=76.849, player_2/loss=625.172, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 490.05it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=88.186, player_2/loss=589.156, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 478.08it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=74.387, player_2/loss=560.809, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 485.84it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=83.142, player_2/loss=510.482, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 487.72it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=108.087, player_2/loss=423.168, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 479.35it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=114.296, player_2/loss=513.259, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 490.53it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=99.128, player_2/loss=624.524, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 485.02it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=53.923, player_2/loss=562.098, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 488.39it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=31.549, player_2/loss=551.621, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 486.94it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=64.430, player_2/loss=584.913, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 483.10it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=45.096, player_2/loss=678.212, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 471.70it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=27.028, player_2/loss=409.729, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.36it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=16.450, player_2/loss=393.143, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 492.78it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=33.477, player_2/loss=350.792, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 492.77it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=31.639, player_2/loss=258.622, rew=-13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 495.20it/s, env_step=5120, len=9, n/ep=8, n/st=64, player_1/loss=59.864, player_2/loss=221.338, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 494.71it/s, env_step=6144, len=26, n/ep=3, n/st=64, player_1/loss=67.744, player_2/loss=190.451, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 478.51it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=79.540, player_2/loss=132.653, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 490.88it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=109.618, player_2/loss=107.023, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 493.78it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=99.860, player_2/loss=51.878, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 494.09it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=67.913, player_2/loss=45.608, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 491.58it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=109.115, player_2/loss=49.270, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 485.25it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=104.788, player_2/loss=35.300, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 488.90it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=91.513, player_2/loss=76.247, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 493.75it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_2/loss=103.195, rew=25.00]       


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 484.32it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=101.385, player_2/loss=62.954, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 492.00it/s, env_step=16384, len=16, n/ep=3, n/st=64, player_1/loss=117.040, player_2/loss=61.033, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 491.57it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=107.978, player_2/loss=68.609, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 492.76it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=102.595, player_2/loss=86.934, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 479.00it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=105.690, player_2/loss=83.444, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 487.81it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=156.967, player_2/loss=151.108, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.66it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=123.641, player_2/loss=131.798, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 491.40it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=108.576, player_2/loss=130.045, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.65it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=115.677, player_2/loss=161.261, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 488.95it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=94.647, player_2/loss=215.407, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 479.60it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=77.455, player_2/loss=230.842, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.50it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=74.619, player_2/loss=250.715, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.61it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=59.507, player_2/loss=246.859, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.85it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=50.255, player_2/loss=274.312, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 494.43it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=30.646, player_2/loss=260.794, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.24it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=20.664, player_2/loss=239.949, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 481.80it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=25.733, player_2/loss=226.083, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 492.32it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=33.288, player_2/loss=211.069, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 489.60it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=27.952, player_2/loss=191.158, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 490.23it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=16.526, player_2/loss=231.370, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 493.70it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=26.997, player_2/loss=270.763, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 490.92it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=90.072, player_2/loss=250.429, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 478.36it/s, env_step=18432, len=11, n/ep=4, n/st=64, player_1/loss=90.672, player_2/loss=219.754, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 488.02it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=24.648, player_2/loss=224.361, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 485.76it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=7.850, player_2/loss=249.313, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.28it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=12.248, player_2/loss=221.931, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.34it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=11.093, player_2/loss=174.851, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.44it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=41.830, player_2/loss=163.759, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 484.47it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=103.500, player_2/loss=139.554, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 493.19it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=97.997, player_2/loss=139.246, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 492.10it/s, env_step=7168, len=27, n/ep=2, n/st=64, player_1/loss=84.397, player_2/loss=106.978, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 491.62it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=157.599, player_2/loss=165.123, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 491.87it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=135.466, player_2/loss=105.935, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 489.95it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=61.508, player_2/loss=31.713, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 489.94it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=41.620, player_2/loss=78.289, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 476.93it/s, env_step=12288, len=26, n/ep=2, n/st=64, player_1/loss=47.001, player_2/loss=73.681, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 488.76it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=79.860, player_2/loss=56.501, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 485.12it/s, env_step=14336, len=25, n/ep=1, n/st=64, player_1/loss=111.524, player_2/loss=94.583, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 488.85it/s, env_step=15360, len=28, n/ep=2, n/st=64, player_2/loss=92.072, rew=25.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 485.32it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=166.877, player_2/loss=99.260, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 479.40it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=216.718, player_2/loss=97.215, rew=-16.67]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 487.17it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=204.251, player_2/loss=149.857, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 489.68it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=239.236, player_2/loss=165.800, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 485.58it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=44.229, player_2/loss=149.471, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 489.88it/s, env_step=2048, len=27, n/ep=2, n/st=64, player_1/loss=77.035, player_2/loss=140.589, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.86it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=51.732, player_2/loss=164.137, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 492.76it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=50.671, player_2/loss=151.059, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 483.01it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=64.237, player_2/loss=129.288, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.82it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=37.848, player_2/loss=122.543, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 489.61it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=59.593, player_2/loss=140.938, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 490.23it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=92.126, player_2/loss=134.780, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 491.76it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=71.589, player_2/loss=180.821, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 488.49it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=43.347, player_2/loss=215.834, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 488.55it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=27.259, player_2/loss=168.549, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 490.16it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=86.653, player_2/loss=151.291, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 491.06it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=124.744, player_2/loss=120.909, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 491.12it/s, env_step=14336, len=8, n/ep=5, n/st=64, player_1/loss=79.971, player_2/loss=86.282, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 488.54it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=81.338, player_2/loss=136.311, rew=-12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 491.59it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=79.942, player_2/loss=136.716, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 489.88it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=74.668, player_2/loss=121.201, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 487.24it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=59.859, player_2/loss=113.600, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 493.19it/s, env_step=19456, len=12, n/ep=3, n/st=64, player_1/loss=46.667, player_2/loss=109.361, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 488.01it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=156.029, player_2/loss=187.848, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 492.29it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=194.376, player_2/loss=185.129, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 489.17it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=215.528, player_2/loss=151.910, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 487.88it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=227.810, player_2/loss=113.950, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 481.69it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=238.924, player_2/loss=109.931, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 482.55it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=223.843, player_2/loss=85.499, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 489.60it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=233.763, player_2/loss=90.785, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 492.72it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=236.817, player_2/loss=37.044, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 492.63it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=233.122, player_2/loss=38.128, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 478.46it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=218.582, player_2/loss=56.609, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 491.52it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=238.348, player_2/loss=80.111, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 489.16it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=216.795, player_2/loss=67.546, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 490.50it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=207.987, player_2/loss=74.159, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 487.62it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=204.616, player_2/loss=16.024, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.95it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=246.264, player_2/loss=46.120, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 481.98it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=280.896, player_2/loss=59.195, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 483.62it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=213.632, player_2/loss=43.300, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 492.21it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=231.347, player_2/loss=40.025, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 488.06it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=240.436, player_2/loss=41.211, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 489.13it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=156.324, player_2/loss=193.986, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.34it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=140.126, player_2/loss=314.235, rew=-5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 480.56it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=122.906, player_2/loss=376.828, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 490.82it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=117.031, player_2/loss=346.311, rew=16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.10it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=100.694, player_2/loss=353.447, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 489.76it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=82.049, player_2/loss=338.292, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 487.23it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=93.096, player_2/loss=310.522, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 491.02it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=115.612, player_2/loss=284.308, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.98it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=82.229, player_2/loss=334.877, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 482.38it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=56.866, rew=16.67]        


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 489.01it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=63.739, player_2/loss=384.697, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 489.02it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=69.094, player_2/loss=490.671, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 489.96it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=64.469, player_2/loss=476.707, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 492.27it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=15.019, player_2/loss=447.003, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 490.64it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=50.020, rew=17.86]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 479.38it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=87.714, player_2/loss=396.145, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 488.87it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=48.467, player_2/loss=333.602, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 465.94it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=63.942, player_2/loss=357.788, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 486.08it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=52.275, player_2/loss=514.574, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 490.54it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=28.934, player_2/loss=334.855, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 487.16it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=46.683, player_2/loss=263.446, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 471.51it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=168.127, player_2/loss=139.453, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 490.88it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=223.623, player_2/loss=75.303, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 487.16it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=289.051, player_2/loss=58.114, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 483.03it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=317.900, player_2/loss=51.914, rew=16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 488.34it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=259.820, player_2/loss=97.309, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 483.83it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=269.684, player_2/loss=121.527, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 480.32it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=355.058, player_2/loss=111.380, rew=-16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 494.00it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=352.586, player_2/loss=105.446, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 491.58it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=305.214, player_2/loss=69.577, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 489.47it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=238.449, player_2/loss=84.604, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 493.83it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=170.252, player_2/loss=114.445, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 497.00it/s, env_step=14336, len=26, n/ep=3, n/st=64, player_1/loss=210.828, player_2/loss=103.899, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 488.05it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=169.097, player_2/loss=92.088, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 483.57it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=176.429, player_2/loss=92.612, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 483.26it/s, env_step=17408, len=26, n/ep=2, n/st=64, player_1/loss=215.647, player_2/loss=91.302, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 482.45it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=214.918, player_2/loss=83.106, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 492.32it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=194.603, player_2/loss=75.634, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 489.02it/s, env_step=1024, len=24, n/ep=3, n/st=64, player_1/loss=104.830, player_2/loss=113.094, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 486.01it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=87.550, player_2/loss=95.711, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 485.58it/s, env_step=3072, len=25, n/ep=3, n/st=64, player_1/loss=96.844, player_2/loss=56.107, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 493.48it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=98.460, player_2/loss=47.264, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.50it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=110.451, player_2/loss=47.818, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 494.41it/s, env_step=6144, len=25, n/ep=2, n/st=64, player_1/loss=114.400, player_2/loss=80.266, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 493.26it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=86.739, player_2/loss=148.436, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 489.58it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=43.350, player_2/loss=171.965, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 484.42it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=22.421, player_2/loss=206.283, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 492.26it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=21.884, player_2/loss=172.239, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 486.67it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=74.706, player_2/loss=143.511, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 493.07it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=34.104, player_2/loss=161.788, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 492.30it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=27.838, player_2/loss=195.485, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 492.08it/s, env_step=14336, len=11, n/ep=4, n/st=64, player_1/loss=23.525, player_2/loss=209.750, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 476.39it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=7.898, player_2/loss=184.795, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 490.83it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=28.611, player_2/loss=152.812, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 493.13it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=42.893, player_2/loss=180.661, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 492.82it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=25.224, player_2/loss=182.171, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 489.76it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=10.468, player_2/loss=185.153, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 490.91it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=54.203, player_2/loss=207.327, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 383.72it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=42.526, player_2/loss=165.754, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.23it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=19.246, player_2/loss=127.106, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 491.58it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=10.759, player_2/loss=118.366, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.51it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=13.826, player_2/loss=114.445, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 473.47it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=12.740, player_2/loss=115.177, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 485.75it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=15.701, player_2/loss=99.061, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 480.54it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=75.801, player_2/loss=104.761, rew=-16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 489.98it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=202.426, player_2/loss=101.392, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 492.68it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=230.398, player_2/loss=113.225, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 488.60it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=236.466, player_2/loss=120.888, rew=-3.57]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 492.57it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=297.138, player_2/loss=92.904, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 496.02it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=268.736, player_2/loss=94.411, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 477.89it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=217.199, player_2/loss=101.804, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 490.93it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=253.245, player_2/loss=118.970, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 490.46it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=244.001, player_2/loss=89.005, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 493.24it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=243.923, player_2/loss=102.229, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 489.94it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=234.496, player_2/loss=98.789, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 488.82it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=217.852, player_2/loss=122.500, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 488.54it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=245.376, player_2/loss=63.743, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 484.31it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=215.003, player_2/loss=99.655, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 494.49it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=212.485, player_2/loss=175.995, rew=19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.99it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=232.801, player_2/loss=386.892, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 492.11it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=198.734, player_2/loss=359.118, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 488.45it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=134.054, player_2/loss=52.049, rew=-18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 489.55it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=135.129, player_2/loss=229.137, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 476.13it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=102.998, player_2/loss=391.226, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 494.42it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=47.716, player_2/loss=475.701, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 491.00it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=103.749, player_2/loss=466.912, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 488.41it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=88.670, player_2/loss=429.102, rew=6.25]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 489.44it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=38.144, player_2/loss=413.118, rew=19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 491.65it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=59.313, player_2/loss=395.763, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 478.78it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=79.002, player_2/loss=410.195, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 492.43it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=92.797, player_2/loss=404.094, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 487.37it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=80.981, player_2/loss=469.432, rew=6.25]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 486.04it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=53.235, player_2/loss=491.581, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 492.53it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=54.132, player_2/loss=404.500, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 491.88it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=64.003, player_2/loss=386.401, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 477.03it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=135.484, player_2/loss=302.733, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 491.62it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=235.968, player_2/loss=189.891, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 494.32it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=302.973, player_2/loss=93.407, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 494.18it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=306.366, player_2/loss=72.082, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 494.14it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=343.466, player_2/loss=61.840, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 495.73it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=296.233, player_2/loss=106.542, rew=16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 488.49it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=218.826, player_2/loss=110.562, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 485.31it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=207.283, player_2/loss=83.993, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 491.64it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=289.130, player_2/loss=37.961, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 488.34it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=343.872, player_2/loss=47.105, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 491.36it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=394.705, player_2/loss=42.956, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 492.83it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=378.806, player_2/loss=30.749, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 496.39it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=295.878, player_2/loss=21.186, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 484.31it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=264.917, player_2/loss=13.249, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 458.98it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=339.731, player_2/loss=19.702, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 456.17it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=347.397, player_2/loss=18.679, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 491.61it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=287.625, player_2/loss=16.705, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 490.44it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=265.097, player_2/loss=22.265, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 493.45it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=268.472, player_2/loss=21.688, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 480.42it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=169.257, player_2/loss=67.850, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 484.67it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=189.724, player_2/loss=108.830, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.71it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=213.649, player_2/loss=130.501, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 491.40it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=194.977, player_2/loss=173.451, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 488.09it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=99.021, player_2/loss=238.746, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 488.31it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=54.741, player_2/loss=281.827, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 494.43it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=39.425, player_2/loss=299.670, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 479.43it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=41.396, player_2/loss=275.093, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 490.42it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=37.127, player_2/loss=286.769, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 492.58it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=34.143, player_2/loss=286.580, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 491.84it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=25.896, player_2/loss=275.650, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 491.78it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=20.936, player_2/loss=250.712, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 491.87it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=18.314, player_2/loss=249.685, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 487.14it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=16.620, player_2/loss=259.246, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 484.41it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=63.914, player_2/loss=278.027, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 405.61it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=69.693, player_2/loss=273.396, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 458.85it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=28.992, player_2/loss=240.910, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 442.76it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=18.399, player_2/loss=286.036, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 423.29it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=11.395, player_2/loss=314.800, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 424.58it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=57.247, player_2/loss=294.598, rew=-10.71]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 453.70it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=59.570, player_2/loss=226.819, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 476.35it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=34.291, player_2/loss=156.068, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 475.67it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=26.676, player_2/loss=97.894, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 453.17it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=69.634, player_2/loss=81.290, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 481.44it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=183.905, player_2/loss=114.555, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 432.58it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=387.108, player_2/loss=116.741, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 436.83it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=573.245, player_2/loss=84.728, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 419.54it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=686.652, player_2/loss=50.319, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 406.19it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=719.922, player_2/loss=27.090, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 409.11it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=891.029, player_2/loss=15.146, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 437.90it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=826.756, player_2/loss=37.140, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 450.90it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=688.262, player_2/loss=33.501, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 454.44it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=625.194, player_2/loss=102.605, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 448.01it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=632.222, player_2/loss=143.319, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 439.79it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=617.786, player_2/loss=128.364, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 455.26it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=565.548, player_2/loss=129.157, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 469.75it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=691.323, player_2/loss=113.925, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 453.24it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=738.090, player_2/loss=63.605, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 455.68it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=376.187, player_2/loss=36.102, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 480.06it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=302.443, player_2/loss=26.911, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 417.03it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=252.383, player_2/loss=98.107, rew=19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 426.22it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=266.276, player_2/loss=351.440, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 434.32it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=243.364, player_2/loss=385.765, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 472.27it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=215.436, player_2/loss=520.819, rew=10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 469.08it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=144.021, player_2/loss=641.659, rew=6.25]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 484.80it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=203.603, player_2/loss=626.389, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 483.90it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=210.530, player_2/loss=544.837, rew=6.25]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 467.98it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=93.174, player_2/loss=577.347, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 473.90it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=58.468, player_2/loss=679.360, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 418.02it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=66.423, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 320.38it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=96.755, player_2/loss=711.756, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 433.51it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=150.391, player_2/loss=739.256, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 422.05it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=155.848, player_2/loss=751.290, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 358.58it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=176.598, rew=13.89]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 432.63it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=107.016, player_2/loss=761.311, rew=6.25]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 456.42it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=71.151, rew=19.44]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 455.34it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=96.325, player_2/loss=705.250, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 430.29it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=31.726, player_2/loss=451.039, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 485.32it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=55.626, player_2/loss=408.199, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 450.19it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=58.775, player_2/loss=339.870, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.56it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=127.359, player_2/loss=246.359, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 491.61it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=167.377, player_2/loss=174.488, rew=-16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 472.50it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=122.962, player_2/loss=124.309, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 390.30it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=153.694, player_2/loss=104.676, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 451.36it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=220.537, player_2/loss=121.426, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 383.01it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=332.266, player_2/loss=155.029, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 407.11it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=448.979, player_2/loss=196.062, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 399.79it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=569.079, player_2/loss=200.284, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 433.66it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=587.952, player_2/loss=122.358, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 457.29it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=753.789, player_2/loss=78.378, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 443.39it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=760.267, player_2/loss=86.428, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 435.00it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=669.284, player_2/loss=66.771, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 437.67it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=616.074, player_2/loss=48.055, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 465.90it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=651.199, player_2/loss=41.849, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 423.23it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=616.340, player_2/loss=44.217, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 436.82it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=668.058, player_2/loss=19.761, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 448.58it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=243.228, player_2/loss=180.568, rew=5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 455.95it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=109.949, player_2/loss=239.349, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 463.91it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=66.294, player_2/loss=296.688, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 454.85it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=67.150, player_2/loss=316.071, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 471.23it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=35.404, player_2/loss=349.504, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 397.77it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=51.886, player_2/loss=279.447, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 411.32it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=23.295, player_2/loss=313.876, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 435.76it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=9.201, player_2/loss=366.306, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 441.82it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=84.180, player_2/loss=378.668, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 424.75it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=98.204, player_2/loss=355.032, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 419.56it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=46.854, player_2/loss=353.783, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 383.72it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_1/loss=30.347, player_2/loss=332.439, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 454.11it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=6.631, player_2/loss=384.169, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 412.21it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=19.260, player_2/loss=483.892, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 401.57it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=19.971, player_2/loss=474.577, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 428.26it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=17.865, player_2/loss=424.535, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 422.19it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=26.172, player_2/loss=476.801, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 442.90it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=18.720, player_2/loss=481.322, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 457.92it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=11.209, player_2/loss=585.895, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 391.33it/s, env_step=1024, len=10, n/ep=7, n/st=64, player_1/loss=84.509, player_2/loss=387.691, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 382.15it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=55.770, player_2/loss=317.365, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.82it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=11.325, player_2/loss=262.992, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 471.73it/s, env_step=4096, len=13, n/ep=6, n/st=64, player_1/loss=89.106, player_2/loss=204.682, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 412.67it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=272.188, player_2/loss=183.403, rew=18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 404.35it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=353.302, player_2/loss=153.808, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 377.37it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=410.902, player_2/loss=117.315, rew=10.71]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 406.31it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=416.961, player_2/loss=88.450, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 464.78it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=341.209, player_2/loss=36.069, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 416.86it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=328.789, rew=18.75]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 393.53it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=336.636, player_2/loss=60.824, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 397.30it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=349.237, player_2/loss=104.439, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 403.23it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=403.677, player_2/loss=92.265, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 434.33it/s, env_step=14336, len=10, n/ep=5, n/st=64, player_1/loss=290.512, player_2/loss=56.272, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 454.18it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=315.880, player_2/loss=44.876, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 452.84it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=378.073, player_2/loss=43.974, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 469.00it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=337.467, player_2/loss=46.873, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 476.72it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=289.255, player_2/loss=66.282, rew=18.75]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 475.95it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=354.687, player_2/loss=51.963, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 435.26it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=305.336, player_2/loss=375.727, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 434.11it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=170.929, player_2/loss=452.139, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 453.98it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=96.872, player_2/loss=593.082, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 428.93it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=68.997, player_2/loss=673.434, rew=3.57]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 449.99it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=155.330, player_2/loss=613.908, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 474.97it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=184.963, player_2/loss=569.090, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 459.48it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=86.964, player_2/loss=683.970, rew=17.86]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 436.86it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=42.865, player_2/loss=654.824, rew=13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 443.71it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=87.277, player_2/loss=493.197, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 424.58it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=134.755, player_2/loss=543.793, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 481.38it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_2/loss=508.984, rew=19.44]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 450.45it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=166.497, player_2/loss=391.723, rew=13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 428.57it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=171.930, player_2/loss=433.750, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 444.10it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=98.726, player_2/loss=631.637, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 444.51it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=90.354, player_2/loss=649.579, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 465.51it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=68.185, player_2/loss=719.408, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 453.44it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=70.128, player_2/loss=673.483, rew=18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 450.99it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=60.851, player_2/loss=754.566, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 458.10it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=77.679, player_2/loss=755.754, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 480.81it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=72.605, player_2/loss=401.354, rew=-6.25]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 479.92it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=104.947, player_2/loss=337.604, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 439.56it/s, env_step=3072, len=11, n/ep=7, n/st=64, player_1/loss=119.363, player_2/loss=228.835, rew=17.86]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 423.16it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=218.202, player_2/loss=94.076, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 461.47it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=389.089, player_2/loss=17.548, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 450.59it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=378.731, player_2/loss=25.637, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 413.40it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=392.530, player_2/loss=39.051, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 474.79it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=352.895, player_2/loss=46.714, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 419.72it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=272.683, player_2/loss=69.070, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 467.23it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=237.338, player_2/loss=70.838, rew=15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 468.95it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=285.579, player_2/loss=70.589, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 479.71it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=338.623, player_2/loss=51.590, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 491.16it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=273.824, player_2/loss=46.860, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 449.94it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=206.817, player_2/loss=48.317, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 382.33it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=245.912, player_2/loss=32.713, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 349.70it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=262.086, player_2/loss=48.155, rew=16.67]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 424.57it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=254.691, player_2/loss=41.626, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 439.48it/s, env_step=18432, len=10, n/ep=5, n/st=64, player_1/loss=308.218, player_2/loss=81.864, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 471.46it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=338.865, player_2/loss=82.443, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 382.59it/s, env_step=1024, len=13, n/ep=4, n/st=64, player_1/loss=153.457, player_2/loss=20.625, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 434.83it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=112.972, player_2/loss=134.394, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 377.69it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=69.167, player_2/loss=255.523, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 416.97it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=61.491, player_2/loss=307.856, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 448.04it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=69.163, player_2/loss=267.129, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 441.23it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=116.176, player_2/loss=250.083, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 473.13it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=97.880, player_2/loss=250.426, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 432.67it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=88.262, player_2/loss=267.278, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 415.50it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=92.265, player_2/loss=234.312, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 411.34it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=26.720, player_2/loss=225.511, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 375.40it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=47.439, player_2/loss=229.766, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 431.42it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=38.039, player_2/loss=225.126, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 418.39it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=12.253, player_2/loss=254.717, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 454.55it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=26.342, player_2/loss=234.134, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 358.49it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=30.195, player_2/loss=248.066, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 443.35it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=8.807, player_2/loss=228.964, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 444.71it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=44.396, player_2/loss=245.713, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 393.41it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=80.398, player_2/loss=198.508, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 425.12it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=74.051, player_2/loss=241.700, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 413.91it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=52.740, player_2/loss=203.758, rew=-16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 397.20it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=88.081, player_2/loss=207.197, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 423.62it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=102.913, player_2/loss=169.670, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 435.29it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=56.003, player_2/loss=104.430, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 438.75it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=87.915, player_2/loss=70.129, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 442.55it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=81.119, player_2/loss=76.093, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 424.22it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=56.007, player_2/loss=72.749, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 437.90it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=56.911, player_2/loss=49.825, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 458.22it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=83.136, player_2/loss=50.052, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 451.55it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=134.370, player_2/loss=86.761, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 424.60it/s, env_step=11264, len=27, n/ep=2, n/st=64, player_1/loss=213.285, player_2/loss=132.357, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 388.71it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=198.465, player_2/loss=193.910, rew=-15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 401.13it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=131.359, player_2/loss=159.828, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 392.98it/s, env_step=14336, len=25, n/ep=3, n/st=64, player_1/loss=68.151, player_2/loss=104.320, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 462.89it/s, env_step=15360, len=25, n/ep=3, n/st=64, player_1/loss=59.060, player_2/loss=42.296, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 405.57it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=53.209, player_2/loss=30.306, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 442.20it/s, env_step=17408, len=30, n/ep=2, n/st=64, player_1/loss=68.794, player_2/loss=89.526, rew=37.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 438.53it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=88.733, player_2/loss=120.226, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 468.63it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=100.721, player_2/loss=101.643, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 414.85it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=48.200, player_2/loss=72.189, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 451.20it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=81.417, player_2/loss=78.093, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 431.90it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=83.709, player_2/loss=78.272, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 430.20it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=63.293, player_2/loss=81.366, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 452.09it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=31.155, player_2/loss=78.003, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 415.93it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=35.270, player_2/loss=76.641, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 436.01it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=24.611, player_2/loss=77.605, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 420.43it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=20.157, player_2/loss=103.967, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 424.55it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=17.876, player_2/loss=100.662, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 469.56it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=6.112, player_2/loss=117.695, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 480.31it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=25.955, player_2/loss=110.721, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 473.60it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=37.178, player_2/loss=111.948, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 480.17it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=35.404, player_2/loss=102.510, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 477.57it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=25.807, player_2/loss=60.908, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 469.21it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=11.472, player_2/loss=39.783, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 473.65it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=13.167, player_2/loss=52.325, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 454.34it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=21.423, player_2/loss=54.288, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 438.33it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=20.681, player_2/loss=64.098, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 471.65it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=9.447, player_2/loss=64.800, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 459.03it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=233.759, player_2/loss=119.785, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 433.99it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=300.853, player_2/loss=97.776, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 401.46it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=291.875, player_2/loss=90.109, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 394.92it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=121.803, player_2/loss=77.058, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 438.14it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=66.273, player_2/loss=75.102, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 460.12it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=36.201, player_2/loss=71.663, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 413.68it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=70.555, player_2/loss=46.020, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 427.27it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=74.695, player_2/loss=42.728, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 431.86it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=60.429, player_2/loss=36.549, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 475.44it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=97.609, player_2/loss=45.281, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 431.95it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=220.738, player_2/loss=139.019, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 441.25it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=294.882, player_2/loss=217.513, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 429.90it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=349.503, player_2/loss=212.623, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 491.69it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=408.683, player_2/loss=136.041, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 421.97it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=378.647, player_2/loss=111.749, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 422.78it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=315.293, player_2/loss=124.936, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 399.92it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=343.430, player_2/loss=115.801, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 454.77it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=550.276, player_2/loss=124.292, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 421.63it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=603.400, player_2/loss=80.398, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 409.86it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=539.657, player_2/loss=79.539, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.43it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=324.543, player_2/loss=61.898, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 384.23it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=208.443, player_2/loss=152.680, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 449.58it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=289.598, player_2/loss=148.566, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 417.83it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=250.102, player_2/loss=146.145, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 434.96it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=113.813, player_2/loss=348.295, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 410.96it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=46.011, player_2/loss=454.382, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 425.77it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=68.464, player_2/loss=416.945, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 416.02it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=109.227, player_2/loss=361.026, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 389.96it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=62.886, player_2/loss=367.487, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 418.36it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=31.971, player_2/loss=381.144, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 448.80it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=25.135, player_2/loss=418.086, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 377.13it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=13.166, player_2/loss=449.871, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 364.22it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=31.997, player_2/loss=453.600, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 433.97it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=14.710, player_2/loss=442.344, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 374.31it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=3.819, player_2/loss=492.798, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 384.36it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=9.615, player_2/loss=451.620, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 438.86it/s, env_step=18432, len=7, n/ep=10, n/st=64, player_1/loss=10.469, player_2/loss=427.608, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 484.86it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=55.795, player_2/loss=464.111, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 449.26it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=110.196, player_2/loss=353.619, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 471.94it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=111.623, player_2/loss=280.863, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 478.55it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=109.513, player_2/loss=174.065, rew=-17.86]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 449.77it/s, env_step=4096, len=9, n/ep=8, n/st=64, player_1/loss=108.952, player_2/loss=129.624, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 459.98it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=144.121, player_2/loss=141.726, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 454.34it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=157.915, player_2/loss=172.984, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 470.65it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=169.925, player_2/loss=169.171, rew=-10.71]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 470.87it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=181.628, player_2/loss=183.316, rew=-10.71]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 451.87it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=168.120, player_2/loss=189.174, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 453.77it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=169.667, rew=-19.44]       


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 368.44it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=111.692, player_2/loss=167.020, rew=-19.44]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 358.38it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=53.677, player_2/loss=178.126, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 389.28it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=47.795, player_2/loss=118.129, rew=-3.57]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 424.39it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=100.023, player_2/loss=93.493, rew=-10.71]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 405.15it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=151.182, player_2/loss=98.621, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 450.78it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=120.697, player_2/loss=66.201, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 381.27it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_2/loss=127.507, rew=-18.75]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 411.16it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=104.504, player_2/loss=149.175, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 406.77it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=95.513, player_2/loss=93.795, rew=-17.86]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 446.08it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=204.481, player_2/loss=179.409, rew=17.86]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 464.39it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=157.207, player_2/loss=205.700, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 490.91it/s, env_step=3072, len=9, n/ep=8, n/st=64, player_1/loss=92.940, player_2/loss=194.485, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 465.04it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=129.503, player_2/loss=147.929, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 449.96it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=145.485, player_2/loss=109.891, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 410.51it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=84.816, player_2/loss=67.445, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 463.89it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=63.281, player_2/loss=88.725, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 425.53it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=111.392, player_2/loss=109.245, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 451.72it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=115.601, player_2/loss=88.734, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 466.76it/s, env_step=10240, len=10, n/ep=7, n/st=64, player_1/loss=81.887, player_2/loss=50.800, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 476.33it/s, env_step=11264, len=9, n/ep=6, n/st=64, player_1/loss=117.408, player_2/loss=92.146, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 456.69it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=74.222, player_2/loss=119.018, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 479.24it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=132.187, player_2/loss=184.880, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 483.41it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=155.518, player_2/loss=165.670, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 425.98it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=124.684, player_2/loss=115.544, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 438.29it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=94.924, player_2/loss=110.151, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 426.60it/s, env_step=17408, len=9, n/ep=8, n/st=64, player_1/loss=45.978, rew=25.00]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 408.84it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=91.328, player_2/loss=80.242, rew=3.57]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 468.52it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=138.101, player_2/loss=180.182, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 476.00it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=189.622, player_2/loss=207.432, rew=-10.71]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 420.56it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=152.055, player_2/loss=157.520, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 458.76it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=73.634, player_2/loss=98.160, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 449.39it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_2/loss=90.893, rew=-17.86]          


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 447.72it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=85.915, player_2/loss=115.783, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 444.10it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=119.760, player_2/loss=114.580, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 455.30it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=130.482, player_2/loss=88.010, rew=-18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 441.65it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=169.217, player_2/loss=148.245, rew=15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 448.73it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=202.842, player_2/loss=182.217, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 453.54it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=144.929, player_2/loss=148.356, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 435.89it/s, env_step=11264, len=16, n/ep=3, n/st=64, player_1/loss=135.120, player_2/loss=152.405, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 438.25it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=94.248, player_2/loss=131.702, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 493.01it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=92.867, player_2/loss=82.449, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 478.12it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=133.626, player_2/loss=35.035, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 457.40it/s, env_step=15360, len=12, n/ep=4, n/st=64, player_1/loss=156.396, player_2/loss=32.125, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 460.28it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=172.410, player_2/loss=29.199, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 445.40it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=192.212, player_2/loss=13.884, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 487.39it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=159.450, player_2/loss=15.039, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 493.97it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=130.001, player_2/loss=87.056, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 450.70it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=137.276, player_2/loss=128.584, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 488.80it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=122.188, player_2/loss=135.657, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 470.61it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=112.679, player_2/loss=119.329, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 428.02it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=118.798, player_2/loss=146.669, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 482.64it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=83.879, player_2/loss=147.298, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 472.94it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_2/loss=103.604, rew=-25.00]        


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 470.26it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=87.897, player_2/loss=90.751, rew=-12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 468.94it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=121.760, player_2/loss=122.305, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 458.67it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=193.859, player_2/loss=169.273, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 414.27it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=149.410, player_2/loss=346.307, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 423.20it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=79.881, player_2/loss=429.532, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 398.53it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=61.714, player_2/loss=386.414, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 447.82it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=65.891, player_2/loss=404.702, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 453.74it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=89.192, player_2/loss=430.028, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 474.66it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=68.099, rew=15.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 472.17it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=72.266, player_2/loss=269.582, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 452.75it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=95.017, player_2/loss=184.887, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 430.90it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=115.060, player_2/loss=174.318, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 392.47it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=100.857, player_2/loss=168.903, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 436.18it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=153.438, player_2/loss=153.327, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 364.04it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=81.565, player_2/loss=129.861, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 482.24it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=65.558, player_2/loss=114.284, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 490.56it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_2/loss=89.247, rew=-25.00]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 477.18it/s, env_step=5120, len=24, n/ep=2, n/st=64, player_1/loss=113.315, player_2/loss=97.249, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 479.09it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=122.694, player_2/loss=91.644, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 479.66it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=208.186, player_2/loss=96.348, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 408.06it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=295.772, player_2/loss=107.991, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 468.88it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=296.806, player_2/loss=89.732, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 422.00it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=234.604, player_2/loss=52.860, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 460.17it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=237.290, player_2/loss=49.988, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 396.54it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=292.550, player_2/loss=35.919, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 380.91it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=287.916, player_2/loss=44.044, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 380.37it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=251.013, player_2/loss=79.774, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 401.46it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=272.743, player_2/loss=73.240, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 410.50it/s, env_step=16384, len=11, n/ep=8, n/st=64, player_1/loss=345.291, player_2/loss=23.195, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 399.27it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=325.336, player_2/loss=10.214, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 463.99it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=263.469, player_2/loss=27.677, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 465.50it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=311.374, player_2/loss=43.514, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 428.10it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=188.579, player_2/loss=94.330, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 418.15it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=204.636, player_2/loss=73.142, rew=-16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 433.00it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=207.508, player_2/loss=122.239, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 448.67it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=148.931, player_2/loss=261.628, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 456.40it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=65.561, player_2/loss=382.896, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 422.29it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=39.892, player_2/loss=372.731, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 457.54it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=30.975, player_2/loss=331.417, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 458.97it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=34.946, player_2/loss=292.212, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 464.32it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=35.150, player_2/loss=267.178, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 393.53it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=12.025, player_2/loss=318.609, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 487.15it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=22.590, player_2/loss=333.549, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 478.34it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=20.723, player_2/loss=331.916, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 440.81it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=12.631, player_2/loss=313.897, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 446.44it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=14.792, player_2/loss=292.101, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 468.65it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=31.121, player_2/loss=271.749, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 447.13it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=26.491, player_2/loss=269.354, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 485.22it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=24.273, player_2/loss=294.618, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 490.51it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=62.451, player_2/loss=310.240, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 436.96it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=93.624, player_2/loss=357.496, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 434.37it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=48.057, player_2/loss=259.084, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 425.42it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=42.103, player_2/loss=207.013, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 470.16it/s, env_step=3072, len=23, n/ep=2, n/st=64, player_1/loss=40.103, player_2/loss=131.589, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 461.88it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_1/loss=72.947, player_2/loss=124.941, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 429.38it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=126.529, player_2/loss=128.326, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 419.27it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=136.643, player_2/loss=146.271, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 418.86it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=228.730, player_2/loss=200.236, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 478.09it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=305.696, player_2/loss=145.529, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 399.76it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=283.372, player_2/loss=131.781, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 414.18it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=280.722, player_2/loss=138.520, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 483.41it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=299.904, player_2/loss=165.604, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 453.34it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=419.218, player_2/loss=166.821, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 458.86it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=455.129, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 468.30it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=403.377, player_2/loss=47.100, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 453.32it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=335.161, player_2/loss=65.493, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 477.32it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=366.357, player_2/loss=65.683, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 445.01it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=391.475, player_2/loss=18.082, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 426.77it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=386.011, player_2/loss=68.510, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 438.04it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=401.774, player_2/loss=67.803, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 473.67it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=353.674, player_2/loss=29.912, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 476.27it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=298.999, player_2/loss=130.172, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 457.43it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=225.949, player_2/loss=353.027, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 457.20it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=121.351, player_2/loss=623.625, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 424.28it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=111.521, player_2/loss=672.241, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 473.29it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=163.669, player_2/loss=602.754, rew=18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 433.57it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=101.473, player_2/loss=679.951, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 401.94it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=61.873, player_2/loss=652.687, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 416.75it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=41.449, player_2/loss=692.945, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 432.31it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=73.958, player_2/loss=670.673, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 483.49it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_2/loss=579.285, rew=19.44]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 432.00it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=106.767, player_2/loss=479.617, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 484.78it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=117.974, player_2/loss=424.171, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 413.95it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=86.965, player_2/loss=421.019, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 373.00it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=65.437, player_2/loss=488.285, rew=13.89]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 449.71it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=83.446, player_2/loss=606.516, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 434.36it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=53.743, player_2/loss=658.702, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 477.23it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=34.348, player_2/loss=632.450, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 473.38it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=63.547, player_2/loss=721.659, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 485.97it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=18.439, player_2/loss=545.492, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 443.22it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=49.485, player_2/loss=429.667, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 429.19it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=85.363, player_2/loss=309.114, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.55it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=94.124, player_2/loss=293.825, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 489.74it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=147.170, player_2/loss=242.189, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 449.60it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=149.482, player_2/loss=148.070, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 363.73it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=169.569, player_2/loss=79.517, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 463.40it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=173.537, player_2/loss=40.337, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 398.95it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=149.672, player_2/loss=58.202, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 474.76it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=135.202, player_2/loss=41.178, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 452.97it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=109.642, player_2/loss=63.364, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 410.45it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=112.023, player_2/loss=54.503, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 418.92it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=131.556, player_2/loss=32.223, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 379.96it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=117.655, player_2/loss=13.603, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 397.06it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=121.724, player_2/loss=28.442, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 413.41it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=146.087, player_2/loss=50.808, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 447.25it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=165.562, player_2/loss=80.705, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 435.82it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=149.752, player_2/loss=40.006, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 472.14it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=106.591, player_2/loss=16.150, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 435.99it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=102.281, player_2/loss=96.746, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 431.49it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=74.725, player_2/loss=131.364, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 426.10it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=101.100, player_2/loss=192.345, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 426.21it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=99.340, player_2/loss=283.929, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 462.30it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=84.190, player_2/loss=253.961, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 465.43it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=73.744, player_2/loss=168.309, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 401.46it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=81.702, player_2/loss=237.734, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 463.10it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=95.806, player_2/loss=285.966, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 459.21it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=89.295, player_2/loss=276.042, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 430.39it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=94.872, player_2/loss=160.214, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 444.25it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=83.204, player_2/loss=89.216, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 429.88it/s, env_step=12288, len=21, n/ep=4, n/st=64, player_1/loss=81.906, player_2/loss=223.959, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 429.60it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=46.697, rew=8.33]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 436.50it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=55.072, player_2/loss=399.474, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 446.54it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=58.648, player_2/loss=304.014, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 440.80it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=28.533, player_2/loss=349.394, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 452.09it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=67.690, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 448.49it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=92.966, player_2/loss=159.297, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 464.01it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=99.994, player_2/loss=184.022, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 439.93it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=68.211, player_2/loss=126.192, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 470.73it/s, env_step=2048, len=29, n/ep=2, n/st=64, player_1/loss=77.427, player_2/loss=103.447, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 433.73it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=90.974, player_2/loss=113.785, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 422.94it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=149.166, player_2/loss=153.342, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 456.38it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=151.157, player_2/loss=102.618, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 446.93it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=140.305, player_2/loss=105.782, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 455.74it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=203.096, player_2/loss=155.460, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 453.98it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=259.553, player_2/loss=130.361, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 429.78it/s, env_step=9216, len=9, n/ep=8, n/st=64, player_1/loss=245.748, player_2/loss=60.276, rew=6.25]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 435.39it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=204.463, player_2/loss=48.926, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 460.66it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=207.744, player_2/loss=55.430, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 449.00it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=259.798, player_2/loss=53.338, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 443.74it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=277.098, player_2/loss=88.195, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 435.55it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=270.041, player_2/loss=95.251, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 441.72it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=279.942, player_2/loss=74.476, rew=2.78]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 447.27it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=236.541, player_2/loss=94.536, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 481.38it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=177.233, player_2/loss=57.071, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 426.43it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=193.490, player_2/loss=29.744, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 430.07it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=214.496, player_2/loss=56.369, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 413.54it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=245.939, player_2/loss=582.418, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 457.23it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=193.667, player_2/loss=807.770, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 451.93it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=90.920, player_2/loss=992.968, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 448.14it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=89.584, player_2/loss=959.136, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 473.91it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=131.704, player_2/loss=916.839, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 443.70it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=86.056, player_2/loss=877.301, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 450.75it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=82.396, player_2/loss=947.710, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 441.75it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=90.479, player_2/loss=862.508, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 433.79it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=127.773, player_2/loss=1124.361, rew=2.78]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 457.39it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=111.661, player_2/loss=1155.965, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 475.43it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=80.532, player_2/loss=1014.415, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 467.57it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=55.734, player_2/loss=981.820, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 456.96it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=67.870, player_2/loss=970.645, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 454.80it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=42.570, player_2/loss=1006.448, rew=6.25]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 442.60it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=26.280, player_2/loss=917.899, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 446.96it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=94.469, player_2/loss=969.372, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 464.86it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=73.024, player_2/loss=978.059, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 439.37it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=49.808, player_2/loss=1031.125, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 447.81it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=69.961, player_2/loss=766.340, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 473.50it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=13.183, player_2/loss=488.190, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 477.44it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=96.866, player_2/loss=329.639, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 450.60it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=192.226, player_2/loss=133.531, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 394.11it/s, env_step=4096, len=27, n/ep=3, n/st=64, player_1/loss=309.454, player_2/loss=165.686, rew=-25.00]


Epoch #4: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 447.38it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_2/loss=179.077, rew=33.33]         


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 482.75it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=108.888, player_2/loss=111.174, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 439.72it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=209.049, player_2/loss=135.085, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 451.73it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=206.590, player_2/loss=142.768, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 447.80it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=88.426, player_2/loss=149.884, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 447.60it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=66.151, player_2/loss=106.512, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 450.88it/s, env_step=11264, len=33, n/ep=2, n/st=64, player_1/loss=203.755, player_2/loss=94.928, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 463.60it/s, env_step=12288, len=29, n/ep=3, n/st=64, player_1/loss=216.740, player_2/loss=113.480, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 450.16it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=217.005, player_2/loss=120.196, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 436.39it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=193.378, player_2/loss=93.320, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 455.26it/s, env_step=15360, len=27, n/ep=2, n/st=64, player_1/loss=127.441, player_2/loss=46.879, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 464.90it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=186.302, player_2/loss=79.447, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 422.49it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=205.948, player_2/loss=92.417, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 461.89it/s, env_step=18432, len=32, n/ep=2, n/st=64, player_1/loss=198.541, player_2/loss=93.300, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 433.22it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=196.284, player_2/loss=80.802, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 445.69it/s, env_step=1024, len=29, n/ep=2, n/st=64, player_1/loss=233.455, player_2/loss=45.063, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 437.61it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=196.109, player_2/loss=78.181, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 438.81it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=137.274, player_2/loss=97.673, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 458.20it/s, env_step=4096, len=22, n/ep=2, n/st=64, player_1/loss=132.919, player_2/loss=98.502, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 459.47it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=94.477, player_2/loss=95.381, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 460.26it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=76.313, player_2/loss=96.291, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 419.87it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=96.393, player_2/loss=63.266, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 434.38it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=93.750, player_2/loss=79.581, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 476.24it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=46.299, player_2/loss=100.968, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 473.01it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=82.298, player_2/loss=82.886, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 460.57it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=76.791, player_2/loss=108.143, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 456.58it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=57.095, player_2/loss=124.292, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 477.25it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=76.421, player_2/loss=116.604, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 453.93it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=53.724, player_2/loss=113.562, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 447.40it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=26.909, player_2/loss=130.698, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 471.05it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=32.814, player_2/loss=135.678, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 480.10it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=46.542, player_2/loss=119.759, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 440.32it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=43.088, player_2/loss=127.515, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 387.22it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=29.433, player_2/loss=126.079, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 433.86it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=15.983, player_2/loss=80.904, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 456.21it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=47.313, player_2/loss=68.365, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 483.60it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=57.419, player_2/loss=87.555, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 427.94it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=68.777, player_2/loss=93.925, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 464.01it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=184.646, player_2/loss=127.969, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 484.74it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=186.065, player_2/loss=167.344, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 449.41it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=175.399, player_2/loss=170.681, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 421.47it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=317.176, player_2/loss=131.148, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 486.53it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=383.774, player_2/loss=72.498, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 460.77it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=401.995, player_2/loss=21.244, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 405.37it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=321.849, player_2/loss=37.660, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 434.46it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=307.754, player_2/loss=35.120, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 486.36it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=376.501, player_2/loss=37.031, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 448.96it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=433.824, player_2/loss=81.854, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 463.43it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=454.539, player_2/loss=103.247, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 485.36it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=398.714, player_2/loss=114.073, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 458.75it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=389.080, player_2/loss=123.784, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 433.30it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=514.383, player_2/loss=49.308, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 435.88it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=474.671, player_2/loss=23.704, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 443.74it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=323.397, player_2/loss=14.768, rew=-15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 445.03it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=255.810, player_2/loss=59.238, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 466.62it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=141.279, player_2/loss=112.963, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 434.87it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=90.457, player_2/loss=182.217, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 405.42it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=80.172, player_2/loss=185.679, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 459.44it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=169.140, player_2/loss=167.747, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 457.93it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=180.765, player_2/loss=123.515, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.74it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=133.049, player_2/loss=141.824, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 442.19it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_2/loss=162.237, rew=-8.33]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 383.66it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=100.292, player_2/loss=264.577, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 457.81it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=84.801, player_2/loss=237.447, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 466.29it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=98.422, player_2/loss=167.939, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 449.54it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=86.958, player_2/loss=152.463, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 466.19it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=75.277, player_2/loss=178.415, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 489.04it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=53.795, player_2/loss=170.590, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 445.01it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=45.800, player_2/loss=138.181, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 383.89it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=90.496, player_2/loss=172.220, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 398.64it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=80.229, player_2/loss=173.623, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 421.49it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=64.894, player_2/loss=158.183, rew=-5.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 352.55it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=182.169, player_2/loss=63.130, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 368.81it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=149.833, player_2/loss=33.788, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 483.65it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=138.992, player_2/loss=16.240, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 412.03it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=159.593, player_2/loss=49.575, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 358.13it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=145.580, player_2/loss=109.486, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 340.60it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=94.779, rew=25.00]          


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 378.57it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=92.905, player_2/loss=56.210, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 457.31it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=112.055, player_2/loss=28.984, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 450.03it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=127.845, player_2/loss=25.202, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 447.74it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=104.672, player_2/loss=16.237, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 325.83it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=105.922, player_2/loss=84.023, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 496.77it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=101.913, player_2/loss=113.407, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 370.96it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=101.092, player_2/loss=67.985, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 347.57it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=119.877, player_2/loss=29.509, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 424.61it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=112.014, player_2/loss=43.287, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 460.71it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=121.440, player_2/loss=65.921, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 454.38it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=131.026, player_2/loss=64.360, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 459.09it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=120.783, player_2/loss=17.115, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 323.58it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=102.919, player_2/loss=14.435, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 452.16it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=110.560, player_2/loss=177.450, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.12it/s, env_step=2048, len=21, n/ep=4, n/st=64, player_1/loss=139.823, player_2/loss=212.547, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 470.35it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=106.241, player_2/loss=245.248, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 451.46it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=105.985, player_2/loss=297.687, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 410.81it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=139.755, player_2/loss=405.635, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 377.84it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=117.242, player_2/loss=444.298, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 434.17it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=116.701, player_2/loss=385.570, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 485.43it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=59.143, player_2/loss=393.814, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 444.68it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=18.124, player_2/loss=431.637, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 414.52it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=17.733, player_2/loss=476.690, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 351.53it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=12.893, player_2/loss=487.873, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 483.01it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=9.403, player_2/loss=551.726, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 445.17it/s, env_step=13312, len=9, n/ep=6, n/st=64, player_1/loss=14.803, player_2/loss=544.722, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 472.34it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=11.285, player_2/loss=510.790, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 464.16it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=25.314, player_2/loss=574.832, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 412.85it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=25.251, player_2/loss=531.237, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 432.36it/s, env_step=17408, len=9, n/ep=5, n/st=64, player_1/loss=13.218, player_2/loss=518.896, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 358.41it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=43.956, rew=16.67]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 462.73it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=43.584, player_2/loss=507.844, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 436.10it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=14.449, player_2/loss=352.957, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.29it/s, env_step=2048, len=8, n/ep=6, n/st=64, player_1/loss=38.105, player_2/loss=278.106, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 360.84it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=134.388, player_2/loss=120.043, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 482.94it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=150.477, player_2/loss=100.088, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 431.31it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=122.381, player_2/loss=138.487, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 410.50it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=137.768, player_2/loss=121.756, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 441.88it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=177.699, player_2/loss=135.019, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 426.86it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=226.267, player_2/loss=89.595, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 436.61it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=180.005, player_2/loss=62.040, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 439.65it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=175.801, player_2/loss=118.246, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 435.58it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=550.917, player_2/loss=143.823, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 423.38it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=772.612, player_2/loss=107.348, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 385.14it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=594.711, player_2/loss=100.335, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 444.19it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=402.522, player_2/loss=76.520, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 436.94it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=252.760, player_2/loss=25.587, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 438.18it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=352.507, player_2/loss=9.358, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 443.29it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=303.202, player_2/loss=5.475, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 377.86it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=237.103, player_2/loss=27.882, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 388.38it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=322.179, player_2/loss=34.213, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 380.21it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=118.393, player_2/loss=66.060, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 329.80it/s, env_step=2048, len=27, n/ep=2, n/st=64, player_1/loss=119.748, player_2/loss=123.339, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 413.44it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=118.827, player_2/loss=202.372, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 357.51it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=89.758, player_2/loss=285.562, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 435.02it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=109.752, player_2/loss=285.514, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 442.60it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=141.228, player_2/loss=324.253, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 443.71it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=128.295, player_2/loss=339.588, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 484.53it/s, env_step=8192, len=7, n/ep=7, n/st=64, player_1/loss=45.890, player_2/loss=274.291, rew=10.71]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 476.42it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=44.933, player_2/loss=292.130, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 489.89it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_2/loss=354.019, rew=19.44]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 479.89it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=123.061, player_2/loss=395.178, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 490.84it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=84.660, player_2/loss=422.431, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 487.19it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=67.343, player_2/loss=400.912, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 487.83it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=96.659, player_2/loss=404.892, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 488.40it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=92.392, player_2/loss=369.947, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 477.78it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=89.545, player_2/loss=385.649, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 489.71it/s, env_step=17408, len=8, n/ep=6, n/st=64, player_1/loss=62.475, player_2/loss=334.164, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 481.71it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=52.500, player_2/loss=399.535, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 488.76it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=34.888, player_2/loss=455.758, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 488.54it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=60.525, player_2/loss=344.628, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 492.99it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=50.524, player_2/loss=354.425, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 481.51it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=40.801, player_2/loss=284.395, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 486.51it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=76.397, player_2/loss=233.957, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 493.63it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=82.706, player_2/loss=195.735, rew=-13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 490.66it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=65.205, player_2/loss=148.521, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 497.46it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=60.992, player_2/loss=119.371, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 492.70it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=81.453, player_2/loss=132.796, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 493.25it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=107.641, player_2/loss=138.634, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 485.11it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=97.801, player_2/loss=105.400, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 492.91it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=100.855, player_2/loss=110.612, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 493.41it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=92.206, player_2/loss=64.750, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 491.26it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=121.639, player_2/loss=29.920, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 493.75it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=170.268, player_2/loss=34.413, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 495.24it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=160.595, player_2/loss=40.210, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 493.66it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=146.416, player_2/loss=109.401, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 482.75it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=123.955, player_2/loss=111.117, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 489.50it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=167.712, player_2/loss=87.826, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 495.16it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=158.082, player_2/loss=129.154, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 490.87it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=47.871, player_2/loss=133.301, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 493.33it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=86.714, rew=12.50]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.96it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=81.999, player_2/loss=114.587, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.28it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=92.954, player_2/loss=170.795, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 476.75it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=165.173, player_2/loss=238.275, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 490.21it/s, env_step=6144, len=7, n/ep=10, n/st=64, player_1/loss=132.146, player_2/loss=367.247, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 492.16it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=95.322, player_2/loss=445.147, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 489.19it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=98.194, player_2/loss=428.400, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 487.56it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=66.450, player_2/loss=428.667, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 492.59it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=17.695, player_2/loss=338.865, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 490.22it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=81.882, player_2/loss=292.823, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 479.56it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=100.457, player_2/loss=271.533, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 490.18it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=61.234, player_2/loss=282.965, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 487.85it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=43.258, player_2/loss=307.095, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 493.57it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=93.238, player_2/loss=287.394, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 489.16it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=113.171, player_2/loss=313.107, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 484.62it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=72.338, player_2/loss=336.931, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 479.61it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=52.333, player_2/loss=374.488, rew=18.75]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 491.77it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=10.680, player_2/loss=389.212, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 489.48it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=12.691, player_2/loss=310.045, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 489.14it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=143.704, player_2/loss=174.099, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 491.01it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=359.455, player_2/loss=83.403, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 490.67it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=588.391, player_2/loss=69.173, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 493.45it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=566.744, player_2/loss=89.976, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 477.67it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=518.651, player_2/loss=67.618, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 489.27it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=591.546, player_2/loss=43.919, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 491.11it/s, env_step=8192, len=8, n/ep=9, n/st=64, player_1/loss=591.231, player_2/loss=50.911, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 491.04it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=590.799, player_2/loss=27.522, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 491.11it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=619.209, player_2/loss=11.425, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 491.40it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=616.200, player_2/loss=41.507, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 491.15it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_2/loss=42.192, rew=16.67]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 480.24it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=725.708, player_2/loss=10.735, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 471.68it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=683.018, player_2/loss=5.705, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 489.17it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=617.355, player_2/loss=46.938, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 495.39it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=496.591, player_2/loss=53.480, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 492.53it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=432.754, player_2/loss=16.722, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 487.86it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=501.527, player_2/loss=23.005, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 493.88it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=683.136, player_2/loss=18.503, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 470.63it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=369.551, player_2/loss=228.565, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 490.19it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_2/loss=472.474, rew=19.44]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 491.54it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=74.467, player_2/loss=512.101, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 490.21it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=129.315, player_2/loss=521.302, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 490.89it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=112.717, player_2/loss=581.400, rew=18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 495.56it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=48.396, player_2/loss=753.078, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 490.36it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=79.921, player_2/loss=770.914, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 484.89it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=135.003, player_2/loss=634.245, rew=-3.57]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 486.48it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=158.304, player_2/loss=491.672, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 491.11it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=108.031, player_2/loss=471.765, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 489.49it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=59.126, player_2/loss=680.067, rew=6.25]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 492.74it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=65.283, player_2/loss=665.398, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 488.80it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=80.437, player_2/loss=748.072, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 490.09it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=52.677, player_2/loss=652.655, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 480.49it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=40.862, player_2/loss=718.737, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 491.30it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=57.997, player_2/loss=751.564, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 490.54it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=53.122, player_2/loss=692.783, rew=2.78]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 491.69it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=52.013, player_2/loss=624.338, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 486.96it/s, env_step=19456, len=8, n/ep=9, n/st=64, player_1/loss=41.716, player_2/loss=638.006, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 492.76it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=463.750, player_2/loss=579.292, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 485.33it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=487.206, player_2/loss=451.325, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.17it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=454.666, player_2/loss=328.439, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 493.21it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=445.279, player_2/loss=249.623, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 498.06it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=518.911, player_2/loss=269.208, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 495.80it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=528.062, player_2/loss=207.709, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 495.14it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=479.812, player_2/loss=180.919, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 493.04it/s, env_step=8192, len=24, n/ep=3, n/st=64, player_1/loss=297.113, player_2/loss=109.290, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 486.01it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=85.794, player_2/loss=82.037, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 491.60it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=166.289, player_2/loss=107.615, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 492.79it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=221.755, player_2/loss=97.819, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 496.85it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=246.996, player_2/loss=81.337, rew=-8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 494.64it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=260.468, player_2/loss=99.100, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 492.34it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=180.607, player_2/loss=112.365, rew=-15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 491.25it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=230.877, player_2/loss=134.020, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 495.09it/s, env_step=16384, len=25, n/ep=2, n/st=64, player_1/loss=402.100, player_2/loss=154.405, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 480.53it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=313.730, player_2/loss=117.536, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 490.75it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=184.005, player_2/loss=85.503, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 494.65it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=168.502, player_2/loss=80.032, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 487.07it/s, env_step=1024, len=24, n/ep=2, n/st=64, player_1/loss=166.080, player_2/loss=105.252, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 490.39it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=117.283, player_2/loss=122.072, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 490.40it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=133.584, player_2/loss=116.562, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 489.71it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=124.316, player_2/loss=102.832, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 477.88it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=131.865, player_2/loss=130.246, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 492.81it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=155.530, player_2/loss=163.069, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 491.58it/s, env_step=7168, len=38, n/ep=2, n/st=64, player_1/loss=177.597, player_2/loss=125.093, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 494.42it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=140.845, player_2/loss=87.985, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 488.11it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=116.923, player_2/loss=90.135, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 487.68it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=123.042, player_2/loss=91.208, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 492.71it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=148.037, player_2/loss=122.350, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 462.16it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=119.439, player_2/loss=125.332, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 414.34it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=95.139, player_2/loss=101.038, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 428.63it/s, env_step=14336, len=30, n/ep=3, n/st=64, player_1/loss=124.623, player_2/loss=107.289, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 414.35it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=135.188, player_2/loss=130.203, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 430.29it/s, env_step=16384, len=22, n/ep=3, n/st=64, player_1/loss=124.674, player_2/loss=150.532, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 444.41it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=94.623, player_2/loss=135.064, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 461.10it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=86.851, player_2/loss=91.609, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 458.40it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=147.153, player_2/loss=101.437, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 477.85it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=158.267, player_2/loss=98.747, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.39it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=165.265, player_2/loss=136.084, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 439.44it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=179.932, player_2/loss=161.458, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 480.42it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=208.775, player_2/loss=98.563, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 475.18it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=209.830, player_2/loss=48.123, rew=17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 481.80it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=214.939, player_2/loss=51.235, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 478.50it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=201.166, player_2/loss=78.525, rew=17.86]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 451.57it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=174.792, player_2/loss=103.677, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 468.71it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=186.796, player_2/loss=93.787, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 477.60it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=194.298, player_2/loss=108.258, rew=18.75]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 479.13it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=200.095, player_2/loss=101.613, rew=3.57]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 468.80it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=209.507, player_2/loss=68.132, rew=6.25]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 479.23it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=182.492, player_2/loss=86.704, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 465.07it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=166.636, player_2/loss=71.686, rew=19.44]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 484.30it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=230.514, player_2/loss=104.219, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 483.82it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=240.112, player_2/loss=67.036, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 481.06it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=210.621, player_2/loss=86.834, rew=18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 481.45it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=213.741, player_2/loss=98.280, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 475.65it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=180.789, player_2/loss=46.385, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 469.91it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=145.848, player_2/loss=113.910, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 421.50it/s, env_step=2048, len=9, n/ep=8, n/st=64, player_1/loss=179.421, player_2/loss=389.844, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 397.02it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=137.039, player_2/loss=598.751, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 456.71it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=83.281, player_2/loss=448.240, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 454.48it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=62.510, player_2/loss=576.290, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 465.12it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=43.105, player_2/loss=716.588, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 457.43it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=66.568, player_2/loss=600.083, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 447.01it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=50.323, player_2/loss=559.107, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 466.16it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=23.840, player_2/loss=684.147, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 424.65it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=22.008, player_2/loss=636.050, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 452.85it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=41.864, player_2/loss=563.314, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 428.05it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=45.062, player_2/loss=647.056, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 453.38it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=36.645, player_2/loss=699.573, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 439.77it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=61.335, player_2/loss=491.180, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 475.03it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=52.341, rew=25.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 480.86it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=34.117, player_2/loss=347.483, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 486.03it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=34.548, player_2/loss=329.953, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 483.28it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=30.056, player_2/loss=353.836, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 483.06it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=11.262, player_2/loss=395.080, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 465.35it/s, env_step=1024, len=12, n/ep=4, n/st=64, player_1/loss=42.724, player_2/loss=245.751, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 467.55it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=41.220, player_2/loss=223.143, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 461.23it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=124.231, player_2/loss=174.639, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 446.04it/s, env_step=4096, len=22, n/ep=2, n/st=64, player_1/loss=155.404, player_2/loss=132.817, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 454.24it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=127.339, player_2/loss=96.586, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 471.32it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=112.207, player_2/loss=95.020, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 474.13it/s, env_step=7168, len=26, n/ep=2, n/st=64, player_1/loss=179.950, player_2/loss=70.248, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 467.83it/s, env_step=8192, len=27, n/ep=2, n/st=64, player_1/loss=182.701, player_2/loss=87.860, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 479.67it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=105.042, rew=-25.00]        


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 475.03it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=124.273, player_2/loss=62.057, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 479.56it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=141.733, player_2/loss=88.511, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 473.63it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=167.420, player_2/loss=103.368, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 454.17it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=163.064, player_2/loss=125.989, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 462.75it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=253.984, player_2/loss=121.773, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 434.87it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=294.264, player_2/loss=97.911, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 460.78it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=240.030, player_2/loss=77.424, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 450.94it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=346.166, player_2/loss=45.433, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 471.34it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=274.227, player_2/loss=56.066, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 452.13it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=119.356, player_2/loss=88.438, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 449.53it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=57.582, player_2/loss=110.233, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 442.40it/s, env_step=2048, len=12, n/ep=4, n/st=64, player_1/loss=58.492, player_2/loss=162.699, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 457.09it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=44.319, player_2/loss=216.214, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 450.44it/s, env_step=4096, len=11, n/ep=4, n/st=64, player_1/loss=30.626, player_2/loss=191.835, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 458.03it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=51.165, player_2/loss=177.833, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 437.44it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=41.718, player_2/loss=180.400, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 456.77it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=41.247, player_2/loss=202.873, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 449.03it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=48.976, player_2/loss=213.531, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 437.01it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=43.054, player_2/loss=219.120, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 441.99it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=52.435, player_2/loss=221.529, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 459.34it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=29.391, player_2/loss=175.532, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 451.75it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=48.327, player_2/loss=166.737, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 454.34it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=67.685, player_2/loss=193.592, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 440.08it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=31.695, player_2/loss=212.770, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 442.97it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=27.294, player_2/loss=230.355, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 414.71it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=24.934, player_2/loss=209.468, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 428.85it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=30.070, player_2/loss=182.838, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 441.43it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=63.717, player_2/loss=167.786, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 444.20it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=47.948, player_2/loss=151.093, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 450.41it/s, env_step=1024, len=20, n/ep=4, n/st=64, player_1/loss=130.172, player_2/loss=174.592, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 458.01it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=171.078, player_2/loss=153.712, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 431.30it/s, env_step=3072, len=26, n/ep=3, n/st=64, player_1/loss=158.155, player_2/loss=143.032, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 413.73it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=128.224, player_2/loss=110.946, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 446.03it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=142.476, player_2/loss=62.578, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 463.84it/s, env_step=6144, len=26, n/ep=1, n/st=64, player_1/loss=126.688, player_2/loss=42.293, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 463.68it/s, env_step=7168, len=25, n/ep=3, n/st=64, player_1/loss=136.805, player_2/loss=43.167, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 432.71it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=144.745, player_2/loss=68.696, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 451.12it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=100.077, player_2/loss=45.755, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 442.23it/s, env_step=10240, len=28, n/ep=3, n/st=64, player_1/loss=88.278, player_2/loss=12.950, rew=16.67]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 443.67it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=185.174, player_2/loss=41.438, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 459.85it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=359.339, player_2/loss=71.787, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 454.42it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=426.779, player_2/loss=54.947, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 451.72it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=350.633, player_2/loss=60.337, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 439.90it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=303.043, player_2/loss=58.849, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 418.79it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=288.007, player_2/loss=52.816, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 470.56it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=267.909, player_2/loss=76.802, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 472.29it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=283.999, player_2/loss=54.781, rew=-12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 462.84it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=372.327, player_2/loss=26.660, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 445.54it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=240.833, player_2/loss=155.452, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 442.79it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=143.922, player_2/loss=267.677, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 445.65it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=41.732, player_2/loss=250.775, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 452.38it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=80.332, player_2/loss=251.055, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 465.40it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=65.026, player_2/loss=293.473, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 454.01it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=29.882, player_2/loss=246.152, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 482.60it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=55.508, player_2/loss=244.244, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 459.38it/s, env_step=8192, len=30, n/ep=2, n/st=64, player_1/loss=71.533, player_2/loss=174.382, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 474.09it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=106.946, player_2/loss=167.475, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 432.54it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=113.014, player_2/loss=174.240, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 455.39it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=80.712, player_2/loss=223.078, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 450.16it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=40.152, player_2/loss=210.288, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 468.03it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=58.056, player_2/loss=143.873, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 451.49it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=53.115, player_2/loss=214.920, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 454.18it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=21.361, player_2/loss=321.029, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 457.19it/s, env_step=16384, len=9, n/ep=6, n/st=64, player_1/loss=36.910, player_2/loss=352.045, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 421.27it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=31.616, player_2/loss=381.833, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 443.37it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=8.089, player_2/loss=541.524, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 459.25it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=14.531, player_2/loss=539.519, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 449.92it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=24.381, player_2/loss=356.848, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 447.03it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=29.977, player_2/loss=323.910, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 458.05it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=36.975, player_2/loss=252.055, rew=-17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 470.84it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=121.756, player_2/loss=175.407, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 470.20it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=194.558, player_2/loss=110.116, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 478.30it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=155.748, player_2/loss=67.835, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 483.88it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=174.827, player_2/loss=109.129, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 470.48it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=186.239, player_2/loss=103.596, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 467.74it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=204.724, player_2/loss=82.093, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 475.42it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=149.426, player_2/loss=74.984, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 469.42it/s, env_step=11264, len=15, n/ep=5, n/st=64, player_1/loss=165.096, player_2/loss=20.335, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 453.54it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=211.808, player_2/loss=13.116, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 469.73it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=200.738, player_2/loss=91.151, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 458.01it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=191.534, rew=25.00]       


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 452.96it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=172.165, player_2/loss=31.256, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 465.25it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=169.125, player_2/loss=65.401, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 487.05it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=136.903, player_2/loss=92.791, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 478.82it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=109.342, player_2/loss=54.008, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 480.76it/s, env_step=19456, len=16, n/ep=5, n/st=64, player_1/loss=156.146, player_2/loss=79.585, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 484.18it/s, env_step=1024, len=26, n/ep=3, n/st=64, player_1/loss=117.959, player_2/loss=88.232, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 486.85it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=85.483, player_2/loss=86.128, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 489.03it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=65.954, player_2/loss=109.597, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 460.06it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=61.854, player_2/loss=147.296, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 473.67it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=79.522, rew=25.00]           


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 454.67it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=95.220, player_2/loss=343.488, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 455.30it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=79.241, player_2/loss=363.294, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 463.76it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=53.177, player_2/loss=369.283, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 461.08it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=64.861, player_2/loss=404.139, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 432.51it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=60.856, player_2/loss=384.024, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 453.12it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=20.708, player_2/loss=365.666, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 393.36it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=9.712, player_2/loss=339.820, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 322.09it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=95.133, player_2/loss=325.214, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 457.12it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=99.921, player_2/loss=281.790, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 446.82it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=19.638, player_2/loss=314.857, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 393.77it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=16.688, player_2/loss=332.608, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 325.54it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=16.867, player_2/loss=349.729, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 377.34it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=33.160, player_2/loss=339.375, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 384.72it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=15.783, player_2/loss=353.631, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 377.36it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=9.372, player_2/loss=274.005, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 426.47it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=17.692, player_2/loss=214.250, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 463.58it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=18.710, player_2/loss=155.659, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 470.52it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=16.038, player_2/loss=131.309, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 446.63it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=21.966, player_2/loss=130.511, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 440.88it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=30.951, player_2/loss=115.526, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 470.44it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_2/loss=103.603, rew=-25.00]        


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 450.94it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=16.839, player_2/loss=75.462, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 447.09it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=10.180, player_2/loss=45.754, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 461.03it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=14.219, player_2/loss=48.465, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 403.00it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=94.786, player_2/loss=151.238, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 380.46it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=132.776, player_2/loss=201.529, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 368.23it/s, env_step=13312, len=25, n/ep=2, n/st=64, player_1/loss=83.511, player_2/loss=155.143, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 368.51it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=112.802, player_2/loss=187.382, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 403.71it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=165.232, player_2/loss=210.428, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 401.47it/s, env_step=16384, len=24, n/ep=3, n/st=64, player_1/loss=141.851, player_2/loss=140.688, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 420.65it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=124.579, player_2/loss=101.027, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 426.33it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=150.382, player_2/loss=91.826, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:03, 299.32it/s, env_step=19456, len=25, n/ep=3, n/st=64, player_1/loss=190.703, player_2/loss=125.741, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 393.50it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=177.264, player_2/loss=110.521, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 430.16it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=145.676, player_2/loss=139.625, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 424.77it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=144.475, player_2/loss=208.931, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 444.87it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=96.006, player_2/loss=247.749, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 436.27it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=85.850, player_2/loss=322.841, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 431.84it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=106.339, rew=13.89]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 435.77it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=78.398, player_2/loss=319.602, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 437.11it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=103.219, player_2/loss=363.377, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 438.18it/s, env_step=9216, len=7, n/ep=10, n/st=64, player_1/loss=135.741, player_2/loss=371.554, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 425.54it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=116.444, player_2/loss=341.042, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 437.58it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=81.688, player_2/loss=306.789, rew=6.25]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 446.77it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=59.307, player_2/loss=310.786, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 446.55it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=88.046, player_2/loss=324.735, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 436.77it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=73.782, player_2/loss=367.166, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 442.24it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=81.394, player_2/loss=367.938, rew=6.25]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 426.62it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=51.952, player_2/loss=361.019, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 445.02it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=39.537, player_2/loss=355.575, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 438.80it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=64.827, player_2/loss=305.958, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 367.58it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=52.065, rew=25.00]         


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 392.02it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=36.809, player_2/loss=238.824, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 459.24it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=29.396, player_2/loss=209.361, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 449.39it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=22.476, player_2/loss=121.636, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 425.44it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=24.206, player_2/loss=84.376, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 395.11it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=20.340, player_2/loss=84.802, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 412.24it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=19.460, player_2/loss=60.578, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 348.91it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=21.340, player_2/loss=55.639, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 437.48it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=21.535, player_2/loss=42.650, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 435.70it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=34.685, player_2/loss=30.656, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 449.56it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=95.570, player_2/loss=68.895, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 447.79it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=190.656, player_2/loss=182.334, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 433.68it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_2/loss=193.552, rew=-12.50]      


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #13: 1025it [00:02, 382.06it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=292.250, player_2/loss=199.934, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #14: 1025it [00:02, 351.70it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=277.599, player_2/loss=184.113, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #15: 1025it [00:02, 479.61it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=217.544, player_2/loss=185.238, rew=-15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #16: 1025it [00:02, 454.13it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=256.219, player_2/loss=125.429, rew=-5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #17: 1025it [00:02, 490.12it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=238.994, player_2/loss=121.704, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #18: 1025it [00:02, 432.20it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=186.141, player_2/loss=97.724, rew=-12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #19: 1025it [00:03, 337.92it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=252.227, player_2/loss=121.698, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #1: 1025it [00:02, 344.62it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=313.963, player_2/loss=94.782, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 432.45it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=221.893, player_2/loss=109.238, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 401.76it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=139.349, player_2/loss=144.321, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 430.16it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=111.612, player_2/loss=131.966, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 436.95it/s, env_step=5120, len=23, n/ep=4, n/st=64, player_1/loss=70.449, player_2/loss=119.159, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 463.91it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=78.548, player_2/loss=175.142, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 453.42it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=72.616, player_2/loss=203.446, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 364.70it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=114.734, player_2/loss=170.091, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 440.88it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=158.852, player_2/loss=164.874, rew=-17.86]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 462.77it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=112.598, player_2/loss=139.761, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 456.59it/s, env_step=11264, len=11, n/ep=4, n/st=64, player_1/loss=51.070, player_2/loss=143.615, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 439.92it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=34.539, player_2/loss=118.956, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 473.46it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=46.383, player_2/loss=101.064, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 449.74it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=46.500, player_2/loss=105.946, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 453.71it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=77.979, player_2/loss=122.773, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 445.29it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=71.553, player_2/loss=134.670, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 446.83it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=34.304, player_2/loss=135.591, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 483.86it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=23.038, player_2/loss=115.044, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 459.26it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=23.049, player_2/loss=94.135, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 394.94it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=49.875, player_2/loss=131.106, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 403.75it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=60.411, player_2/loss=110.845, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 404.55it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=35.052, player_2/loss=74.580, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 417.69it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=32.799, player_2/loss=72.041, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 436.87it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=92.136, player_2/loss=84.392, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 383.43it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=140.945, player_2/loss=105.376, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 435.05it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=180.110, player_2/loss=101.989, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 445.78it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=199.417, player_2/loss=96.796, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 389.01it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=211.128, player_2/loss=110.294, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 401.20it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=216.759, player_2/loss=111.458, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 361.59it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=183.350, player_2/loss=81.114, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 444.88it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=142.568, player_2/loss=68.518, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 453.51it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=116.142, player_2/loss=77.879, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 449.48it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=127.666, player_2/loss=136.585, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 445.33it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=126.696, player_2/loss=169.720, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 422.40it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=159.491, player_2/loss=105.085, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 379.19it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=152.463, player_2/loss=138.498, rew=-19.44]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 390.68it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=196.427, player_2/loss=180.392, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 392.97it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=282.108, player_2/loss=135.028, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 357.71it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=121.522, player_2/loss=147.026, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 410.39it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=162.651, player_2/loss=178.451, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 394.64it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=162.027, player_2/loss=228.388, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 392.18it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=148.702, player_2/loss=238.959, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 373.97it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=130.169, player_2/loss=227.631, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 377.53it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=93.496, player_2/loss=254.435, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 452.29it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=57.536, player_2/loss=307.981, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 442.97it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=87.342, player_2/loss=314.331, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 407.40it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=99.891, player_2/loss=258.058, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 451.61it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=44.017, player_2/loss=254.960, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 419.87it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=21.503, player_2/loss=298.305, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 386.89it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=32.939, player_2/loss=261.502, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 386.33it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=79.036, player_2/loss=253.061, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 389.21it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=72.198, player_2/loss=229.657, rew=15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 432.32it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=121.683, player_2/loss=197.427, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 421.43it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=152.927, player_2/loss=200.916, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 424.67it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=106.077, player_2/loss=204.205, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 442.55it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=52.067, player_2/loss=181.925, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 447.11it/s, env_step=19456, len=10, n/ep=5, n/st=64, player_1/loss=15.902, player_2/loss=181.666, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 440.03it/s, env_step=1024, len=25, n/ep=2, n/st=64, player_1/loss=24.962, player_2/loss=157.910, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 449.78it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=21.063, player_2/loss=136.743, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 442.04it/s, env_step=3072, len=20, n/ep=2, n/st=64, player_1/loss=61.151, player_2/loss=135.071, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 439.61it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=139.215, player_2/loss=117.898, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 429.51it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=130.035, player_2/loss=88.626, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 461.31it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=127.226, player_2/loss=97.020, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 438.22it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=122.815, player_2/loss=109.108, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 447.01it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=124.228, player_2/loss=105.783, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 439.13it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=143.321, rew=-8.33]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 429.87it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=118.681, player_2/loss=65.445, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 457.80it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=110.856, player_2/loss=76.039, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 475.20it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=109.704, player_2/loss=81.011, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 453.99it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=106.219, player_2/loss=52.957, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 464.20it/s, env_step=14336, len=17, n/ep=3, n/st=64, player_1/loss=112.744, player_2/loss=79.373, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 450.39it/s, env_step=15360, len=24, n/ep=2, n/st=64, player_1/loss=78.736, player_2/loss=76.758, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 461.25it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=98.388, player_2/loss=43.972, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 438.43it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=88.177, player_2/loss=27.330, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 449.78it/s, env_step=18432, len=24, n/ep=3, n/st=64, player_1/loss=82.174, player_2/loss=30.292, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 448.33it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=83.717, player_2/loss=39.350, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 443.52it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=63.135, player_2/loss=35.757, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 453.18it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=51.663, rew=25.00]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 472.54it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=70.609, player_2/loss=31.340, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 471.48it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=95.391, player_2/loss=40.595, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 451.72it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_2/loss=48.906, rew=25.00]          


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 465.04it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=84.079, player_2/loss=65.285, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 455.37it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=34.624, player_2/loss=56.658, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 463.67it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=22.215, player_2/loss=75.817, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 427.40it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=52.596, player_2/loss=97.997, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 480.17it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=37.403, player_2/loss=97.241, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 438.19it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=19.362, player_2/loss=48.163, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 448.31it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=21.564, player_2/loss=62.394, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 451.78it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=7.807, player_2/loss=67.516, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 462.93it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=46.527, player_2/loss=90.861, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 463.33it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=48.378, player_2/loss=78.844, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 460.54it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=9.666, player_2/loss=55.729, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 461.20it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=50.347, player_2/loss=92.251, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 457.63it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=62.535, player_2/loss=120.948, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 423.52it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=16.185, player_2/loss=73.582, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 434.62it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=32.068, player_2/loss=30.070, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 429.61it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=20.547, player_2/loss=28.392, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 430.51it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=90.991, player_2/loss=88.821, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 426.53it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=147.531, player_2/loss=146.600, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 461.79it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=74.881, player_2/loss=98.094, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 444.41it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=119.483, player_2/loss=73.127, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 422.36it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=160.372, player_2/loss=85.054, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 446.58it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=129.838, player_2/loss=72.679, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 453.73it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=142.418, player_2/loss=128.862, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 451.95it/s, env_step=10240, len=13, n/ep=3, n/st=64, player_1/loss=92.854, player_2/loss=131.216, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 458.42it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=136.172, player_2/loss=139.613, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 453.34it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=223.215, player_2/loss=125.645, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 444.03it/s, env_step=13312, len=9, n/ep=6, n/st=64, player_1/loss=256.232, player_2/loss=107.409, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 451.18it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=282.021, player_2/loss=63.342, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 442.40it/s, env_step=15360, len=13, n/ep=6, n/st=64, player_1/loss=390.882, player_2/loss=80.411, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 461.46it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=391.675, player_2/loss=72.131, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 453.82it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=334.875, player_2/loss=70.110, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 448.95it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=338.586, player_2/loss=50.633, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 471.55it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=326.348, player_2/loss=62.828, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 451.15it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=267.313, player_2/loss=126.679, rew=3.57]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 457.82it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=221.673, player_2/loss=188.358, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 464.43it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=96.720, player_2/loss=313.337, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 459.42it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=65.150, player_2/loss=374.224, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 473.50it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=55.193, player_2/loss=438.981, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 454.69it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=94.926, player_2/loss=458.419, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 458.03it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=88.352, player_2/loss=426.036, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 469.77it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=104.135, player_2/loss=421.184, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 447.90it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=119.612, player_2/loss=444.843, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 422.51it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=20.315, player_2/loss=462.187, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 419.22it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=5.939, player_2/loss=511.366, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 444.92it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=7.083, player_2/loss=461.792, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 460.04it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=38.520, player_2/loss=483.688, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 462.87it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=39.438, player_2/loss=506.559, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 469.68it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=9.560, player_2/loss=475.179, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 439.06it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=17.308, player_2/loss=472.241, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 447.98it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=19.016, player_2/loss=472.311, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 479.19it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=8.579, player_2/loss=519.285, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 475.24it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=6.077, player_2/loss=502.339, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 440.47it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=50.654, player_2/loss=327.118, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 453.84it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=31.288, player_2/loss=255.634, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 454.40it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=6.866, rew=-18.75]           


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 449.00it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=68.327, player_2/loss=202.080, rew=-17.86]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 439.81it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=163.499, player_2/loss=178.652, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 459.49it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=225.166, player_2/loss=105.548, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 445.60it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=350.238, player_2/loss=84.276, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 433.88it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=371.888, player_2/loss=61.016, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 450.98it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=362.816, player_2/loss=33.708, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 453.38it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=385.130, player_2/loss=23.741, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 429.99it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=333.418, player_2/loss=75.953, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 430.92it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=322.934, player_2/loss=75.496, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 445.71it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=320.456, player_2/loss=51.491, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 448.06it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=297.562, player_2/loss=49.146, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 461.32it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=319.389, player_2/loss=44.608, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 458.66it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=317.072, player_2/loss=7.194, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 467.19it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=423.168, player_2/loss=24.004, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 444.26it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=486.685, player_2/loss=23.486, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 463.42it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=459.581, rew=25.00]       


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 472.19it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=289.201, player_2/loss=5.743, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 402.27it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=225.052, player_2/loss=5.496, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 390.72it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=172.273, player_2/loss=9.609, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 381.43it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=146.090, player_2/loss=23.393, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 409.10it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=112.515, player_2/loss=64.168, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 387.26it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=120.005, player_2/loss=73.981, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 384.81it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=139.018, player_2/loss=85.209, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 433.60it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=115.293, player_2/loss=130.248, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 455.05it/s, env_step=9216, len=22, n/ep=2, n/st=64, player_1/loss=112.913, player_2/loss=104.689, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 482.89it/s, env_step=10240, len=24, n/ep=2, n/st=64, player_1/loss=121.316, player_2/loss=134.403, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 458.99it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=155.700, player_2/loss=144.592, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 450.96it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=182.562, player_2/loss=306.275, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 455.74it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=112.167, player_2/loss=443.004, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 477.86it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=78.686, player_2/loss=389.837, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 467.43it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=46.195, player_2/loss=478.428, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 468.43it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=47.849, player_2/loss=537.770, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 452.75it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=38.318, player_2/loss=544.300, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 435.91it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=25.850, player_2/loss=580.413, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 422.29it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=25.429, player_2/loss=521.386, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 434.59it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=90.689, player_2/loss=381.753, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 445.28it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=228.885, player_2/loss=261.418, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 465.05it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=275.731, player_2/loss=153.551, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 451.34it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=254.517, player_2/loss=75.952, rew=3.57]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 467.82it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=281.331, player_2/loss=63.458, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 472.80it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=278.017, player_2/loss=137.771, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 472.78it/s, env_step=7168, len=8, n/ep=6, n/st=64, player_1/loss=244.557, player_2/loss=102.662, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 471.24it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=308.663, player_2/loss=107.466, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 478.06it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=304.447, player_2/loss=115.865, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 478.05it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=240.286, player_2/loss=154.661, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 476.52it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=250.192, player_2/loss=119.587, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 452.67it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=261.496, player_2/loss=43.742, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 399.52it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=226.446, player_2/loss=38.332, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 390.88it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=245.065, player_2/loss=29.271, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 392.25it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=222.870, player_2/loss=42.956, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 394.29it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_2/loss=18.982, rew=25.00]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 423.89it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=298.931, player_2/loss=17.700, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 440.48it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=307.657, player_2/loss=70.265, rew=18.75]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 469.51it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=293.994, player_2/loss=83.128, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 472.45it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=196.216, player_2/loss=512.726, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 451.49it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=113.954, rew=25.00]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 454.90it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=93.628, player_2/loss=684.639, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 465.13it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=127.786, player_2/loss=707.774, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 462.49it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=79.725, player_2/loss=675.787, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 472.85it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=67.676, player_2/loss=548.669, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 477.05it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=89.842, player_2/loss=724.275, rew=19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 477.84it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=71.241, player_2/loss=734.998, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 476.28it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=70.655, player_2/loss=652.422, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 483.53it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=54.530, player_2/loss=699.335, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 485.12it/s, env_step=11264, len=8, n/ep=6, n/st=64, player_1/loss=31.703, player_2/loss=754.192, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 482.78it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=31.209, player_2/loss=652.797, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 479.97it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=15.442, player_2/loss=526.128, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 488.52it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=41.829, player_2/loss=564.073, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 468.41it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=55.408, player_2/loss=672.621, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 486.96it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=18.865, player_2/loss=723.078, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 486.24it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=37.491, player_2/loss=662.624, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 475.45it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=40.323, player_2/loss=669.482, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 482.91it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=30.635, player_2/loss=548.745, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 473.60it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=29.191, player_2/loss=333.267, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 462.90it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=24.662, player_2/loss=310.974, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 467.61it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=60.899, player_2/loss=241.757, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 468.89it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=61.387, player_2/loss=191.627, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 467.61it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=42.158, player_2/loss=187.094, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 476.06it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=36.791, player_2/loss=124.849, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 436.59it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=13.632, player_2/loss=96.458, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 401.09it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=19.325, player_2/loss=75.442, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 422.29it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=51.677, player_2/loss=88.270, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #10: 1025it [00:02, 480.47it/s, env_step=10240, len=31, n/ep=2, n/st=64, player_1/loss=84.631, player_2/loss=101.309, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #11: 1025it [00:02, 484.29it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=116.366, player_2/loss=72.867, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #12: 1025it [00:02, 481.03it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=128.443, player_2/loss=73.868, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #13: 1025it [00:02, 482.57it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=122.636, player_2/loss=73.802, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #14: 1025it [00:02, 485.58it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=110.218, player_2/loss=87.377, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #15: 1025it [00:02, 475.97it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=109.193, player_2/loss=58.922, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #16: 1025it [00:02, 473.04it/s, env_step=16384, len=24, n/ep=3, n/st=64, player_1/loss=77.895, player_2/loss=43.546, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #17: 1025it [00:02, 486.99it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=123.574, player_2/loss=97.489, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #18: 1025it [00:02, 481.88it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=139.497, player_2/loss=103.727, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #19: 1025it [00:02, 482.57it/s, env_step=19456, len=24, n/ep=2, n/st=64, player_1/loss=153.999, player_2/loss=56.066, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #1: 1025it [00:02, 482.33it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=95.462, player_2/loss=76.926, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 484.38it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=83.305, player_2/loss=64.406, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 476.17it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=94.292, player_2/loss=97.913, rew=-15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 478.81it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=143.983, player_2/loss=131.593, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 473.05it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=119.573, player_2/loss=118.131, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 469.13it/s, env_step=6144, len=21, n/ep=4, n/st=64, player_1/loss=36.787, player_2/loss=103.633, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 465.65it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=37.428, player_2/loss=99.776, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 469.79it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=59.648, player_2/loss=180.219, rew=-18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 473.57it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=75.883, player_2/loss=214.846, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 476.33it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=44.739, player_2/loss=213.165, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 479.53it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=16.816, player_2/loss=212.303, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 479.33it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=10.254, player_2/loss=188.577, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 485.13it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=13.154, player_2/loss=178.853, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 480.32it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=21.174, player_2/loss=156.226, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 484.15it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=17.133, player_2/loss=146.665, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 486.24it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=17.078, player_2/loss=150.730, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 476.06it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=17.983, player_2/loss=156.421, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 474.34it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=9.587, player_2/loss=172.523, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 467.57it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=6.602, player_2/loss=185.441, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 477.35it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=7.772, player_2/loss=189.574, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 486.30it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=13.876, player_2/loss=138.925, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 485.77it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=14.191, player_2/loss=125.058, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 486.25it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=9.061, player_2/loss=109.569, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 472.83it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=5.902, player_2/loss=89.298, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 481.92it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=16.672, player_2/loss=82.938, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 484.84it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=37.516, player_2/loss=104.859, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 481.14it/s, env_step=8192, len=29, n/ep=3, n/st=64, player_1/loss=39.094, player_2/loss=98.097, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 489.28it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=70.675, player_2/loss=74.051, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 485.28it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=102.123, player_2/loss=83.915, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 486.95it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=117.705, player_2/loss=108.480, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 475.45it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=110.398, player_2/loss=87.918, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 477.88it/s, env_step=13312, len=35, n/ep=2, n/st=64, player_1/loss=93.445, player_2/loss=85.870, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #14: 1025it [00:02, 489.27it/s, env_step=14336, len=18, n/ep=5, n/st=64, player_1/loss=90.272, player_2/loss=81.773, rew=-15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #15: 1025it [00:02, 484.14it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=206.570, player_2/loss=117.801, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #16: 1025it [00:02, 484.13it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=460.869, player_2/loss=132.280, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #17: 1025it [00:02, 482.00it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=769.715, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #18: 1025it [00:02, 474.95it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=856.128, player_2/loss=59.800, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #19: 1025it [00:02, 458.56it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=686.735, player_2/loss=64.118, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #1: 1025it [00:02, 468.55it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=458.306, player_2/loss=174.220, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 469.66it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=359.023, player_2/loss=110.701, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 466.94it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=234.031, player_2/loss=81.156, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 483.50it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=209.587, player_2/loss=77.402, rew=-17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 486.93it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=206.855, player_2/loss=45.785, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 480.24it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=196.510, player_2/loss=74.365, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 469.07it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=220.174, player_2/loss=86.062, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 481.60it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=171.959, player_2/loss=66.377, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 481.73it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=126.072, player_2/loss=77.308, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 479.80it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=234.908, player_2/loss=277.742, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 473.46it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=173.391, player_2/loss=497.717, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 479.54it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=135.573, player_2/loss=583.922, rew=19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 471.64it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=159.744, player_2/loss=532.390, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 457.13it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=150.592, player_2/loss=528.093, rew=19.44]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 466.05it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=49.130, player_2/loss=505.530, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 465.48it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=79.982, player_2/loss=564.982, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 465.18it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=76.750, player_2/loss=611.541, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 476.16it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=91.634, player_2/loss=593.422, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 482.15it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=96.094, rew=25.00]         


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 471.60it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=29.073, player_2/loss=424.827, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 464.43it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=30.990, player_2/loss=324.994, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 468.86it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=60.342, player_2/loss=221.617, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 481.99it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=58.618, player_2/loss=194.408, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 484.17it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=52.776, player_2/loss=224.536, rew=-13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 483.52it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=80.824, player_2/loss=211.307, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 473.80it/s, env_step=7168, len=31, n/ep=2, n/st=64, player_1/loss=75.002, player_2/loss=152.141, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 471.20it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=79.679, player_2/loss=79.337, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 406.49it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=109.035, player_2/loss=128.065, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 436.17it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=142.944, player_2/loss=118.112, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 469.11it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=145.996, player_2/loss=62.397, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 470.50it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=141.630, player_2/loss=44.049, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 473.47it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=126.305, player_2/loss=63.552, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 472.83it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=154.651, player_2/loss=42.515, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 477.40it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=143.589, player_2/loss=27.855, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 471.07it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=123.004, player_2/loss=21.275, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 417.38it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=139.243, player_2/loss=13.480, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 404.96it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=143.685, player_2/loss=24.516, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 429.55it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=155.275, player_2/loss=55.693, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 449.41it/s, env_step=1024, len=22, n/ep=2, n/st=64, player_1/loss=146.006, player_2/loss=45.843, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 488.66it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=125.024, player_2/loss=153.839, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 480.74it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=93.894, player_2/loss=172.555, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 487.42it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=84.015, player_2/loss=156.347, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 488.32it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=85.103, player_2/loss=152.516, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 478.95it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=70.565, player_2/loss=165.254, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 485.46it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=72.554, player_2/loss=188.834, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 477.38it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=53.420, player_2/loss=237.520, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 495.33it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=41.063, player_2/loss=206.851, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 477.43it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=96.101, player_2/loss=157.664, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 488.01it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=119.509, player_2/loss=204.509, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 483.64it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=72.805, player_2/loss=230.248, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 484.30it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=82.311, player_2/loss=222.692, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 485.19it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=93.173, player_2/loss=190.388, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 480.11it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=66.103, player_2/loss=174.368, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 424.75it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=28.550, player_2/loss=170.541, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 408.26it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=49.372, player_2/loss=164.933, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 461.28it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=49.883, player_2/loss=189.742, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 472.85it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=56.097, player_2/loss=211.689, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 472.78it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=50.927, player_2/loss=155.067, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 476.38it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=45.217, player_2/loss=149.361, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 474.76it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=53.617, player_2/loss=121.699, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 454.40it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=106.594, player_2/loss=150.059, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 437.84it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=143.585, player_2/loss=149.903, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 460.90it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=112.994, player_2/loss=112.006, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 455.37it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=89.513, player_2/loss=91.945, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 465.38it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=99.622, player_2/loss=95.588, rew=-15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 460.89it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=80.511, player_2/loss=106.768, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 466.57it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=74.577, player_2/loss=94.629, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 458.22it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=77.934, player_2/loss=59.405, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 461.33it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=47.422, player_2/loss=77.318, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 468.34it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=53.032, player_2/loss=80.778, rew=-15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 467.12it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=124.112, player_2/loss=103.246, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 466.97it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=163.681, player_2/loss=112.503, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 462.20it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=112.330, player_2/loss=79.102, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 464.38it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=77.502, player_2/loss=71.013, rew=-15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 466.36it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=112.493, player_2/loss=87.459, rew=-5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 471.69it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=154.664, player_2/loss=140.460, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 463.19it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=241.653, player_2/loss=152.034, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 471.92it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=203.068, player_2/loss=217.313, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 476.98it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=110.356, player_2/loss=248.577, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 461.72it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=84.053, player_2/loss=261.258, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 469.87it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=105.371, player_2/loss=299.995, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 460.65it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=99.675, player_2/loss=273.763, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 474.32it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=69.703, player_2/loss=250.900, rew=2.78]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 470.07it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=74.988, player_2/loss=248.387, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 472.09it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=96.429, player_2/loss=242.154, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 471.41it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=78.245, player_2/loss=240.203, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 466.85it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=69.055, player_2/loss=219.321, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 476.02it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=48.947, player_2/loss=215.508, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 467.44it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=51.392, player_2/loss=213.159, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 469.48it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=98.284, player_2/loss=263.798, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 471.27it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=59.425, player_2/loss=291.356, rew=19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 479.23it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=101.125, player_2/loss=267.437, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 473.44it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=80.909, player_2/loss=202.703, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 473.64it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=35.991, player_2/loss=243.167, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 464.65it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=55.754, player_2/loss=258.136, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 459.51it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=67.931, player_2/loss=231.074, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 470.63it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=68.587, player_2/loss=197.726, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 452.82it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=52.319, player_2/loss=155.642, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 436.73it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=88.125, player_2/loss=201.795, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 448.82it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=142.802, player_2/loss=182.712, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 431.87it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=186.577, player_2/loss=106.686, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 425.79it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=192.423, player_2/loss=98.717, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 417.83it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=117.931, player_2/loss=90.329, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 428.91it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=79.978, player_2/loss=98.285, rew=-15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #10: 1025it [00:02, 423.52it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=108.706, player_2/loss=83.215, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #11: 1025it [00:02, 476.62it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=166.544, player_2/loss=65.572, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #12: 1025it [00:02, 473.64it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=156.514, player_2/loss=85.686, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #13: 1025it [00:02, 461.83it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=128.411, player_2/loss=78.962, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #14: 1025it [00:02, 461.34it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=110.986, player_2/loss=63.948, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #15: 1025it [00:02, 456.19it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=105.896, player_2/loss=55.566, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #16: 1025it [00:02, 479.10it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=118.232, player_2/loss=77.320, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #17: 1025it [00:02, 478.29it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=137.113, player_2/loss=74.704, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #18: 1025it [00:02, 466.43it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=184.148, player_2/loss=82.942, rew=-13.89]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #19: 1025it [00:02, 469.11it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=201.817, player_2/loss=115.541, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #1: 1025it [00:02, 456.59it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=170.642, player_2/loss=69.692, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 442.41it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=106.895, player_2/loss=54.954, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 432.88it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=81.760, player_2/loss=33.008, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 389.56it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_2/loss=125.072, rew=12.50]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 394.43it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=93.704, player_2/loss=183.225, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 406.23it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=41.368, player_2/loss=183.269, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 463.38it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=45.872, rew=25.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 462.78it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=38.095, player_2/loss=146.204, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 473.67it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=11.795, player_2/loss=196.052, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 438.69it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=10.449, player_2/loss=189.337, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 447.83it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=54.291, player_2/loss=141.255, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 451.54it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=41.048, player_2/loss=148.289, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 439.93it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=19.458, player_2/loss=133.235, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 450.93it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=20.446, player_2/loss=148.424, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 455.15it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=21.798, player_2/loss=140.430, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 469.55it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=6.194, player_2/loss=127.676, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 479.39it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=4.504, player_2/loss=145.387, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 466.87it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=5.651, player_2/loss=132.964, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 451.35it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=4.408, player_2/loss=149.804, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 428.33it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=7.807, player_2/loss=116.166, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.22it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=18.500, player_2/loss=80.334, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 492.33it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=78.750, rew=-8.33]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 492.77it/s, env_step=4096, len=30, n/ep=2, n/st=64, player_1/loss=120.424, player_2/loss=117.086, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 479.22it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=111.243, player_2/loss=161.620, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 495.37it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=110.294, player_2/loss=110.097, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 493.54it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=112.375, player_2/loss=88.971, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 496.81it/s, env_step=8192, len=24, n/ep=3, n/st=64, player_1/loss=105.168, player_2/loss=129.808, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 494.70it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=113.033, player_2/loss=157.338, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 493.09it/s, env_step=10240, len=27, n/ep=3, n/st=64, player_1/loss=108.619, player_2/loss=113.052, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 492.08it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=110.313, player_2/loss=100.130, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 473.78it/s, env_step=12288, len=26, n/ep=2, n/st=64, player_1/loss=120.491, player_2/loss=102.003, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.38it/s, env_step=13312, len=27, n/ep=2, n/st=64, player_1/loss=120.906, player_2/loss=100.518, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 487.61it/s, env_step=14336, len=23, n/ep=2, n/st=64, player_1/loss=148.048, player_2/loss=56.639, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 494.46it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=157.139, player_2/loss=66.632, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 489.05it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=117.416, player_2/loss=74.314, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 487.23it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=120.404, player_2/loss=62.491, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 493.22it/s, env_step=18432, len=24, n/ep=2, n/st=64, player_1/loss=211.707, rew=25.00]       


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 383.73it/s, env_step=19456, len=23, n/ep=3, n/st=64, player_1/loss=209.027, player_2/loss=44.272, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 350.33it/s, env_step=1024, len=26, n/ep=2, n/st=64, player_1/loss=126.257, player_2/loss=133.283, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.01it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=104.973, player_2/loss=121.426, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 463.59it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=86.531, player_2/loss=139.558, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 422.47it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_2/loss=97.089, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 432.36it/s, env_step=5120, len=26, n/ep=3, n/st=64, player_1/loss=97.850, player_2/loss=40.895, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 436.48it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=110.925, player_2/loss=52.225, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 422.15it/s, env_step=7168, len=25, n/ep=2, n/st=64, player_1/loss=99.756, player_2/loss=139.044, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 449.48it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=58.893, player_2/loss=120.665, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 425.04it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=78.663, player_2/loss=83.699, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 450.60it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=90.694, player_2/loss=117.010, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 406.20it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=155.852, player_2/loss=151.785, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 375.70it/s, env_step=12288, len=16, n/ep=3, n/st=64, player_1/loss=142.997, player_2/loss=177.487, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 372.80it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=143.997, player_2/loss=257.252, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 456.96it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=93.291, rew=25.00]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 466.36it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=30.611, player_2/loss=357.687, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 461.27it/s, env_step=16384, len=9, n/ep=6, n/st=64, player_2/loss=308.494, rew=25.00]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 471.80it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=21.681, player_2/loss=320.332, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 466.85it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=40.262, player_2/loss=314.134, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 467.18it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=40.279, player_2/loss=283.264, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 454.56it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=26.963, player_2/loss=285.306, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 468.66it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=46.409, player_2/loss=243.623, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 471.07it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=58.401, player_2/loss=185.993, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 455.54it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=21.620, player_2/loss=146.004, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 457.11it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=17.895, player_2/loss=123.504, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 460.38it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=19.129, player_2/loss=113.796, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 455.99it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=23.633, player_2/loss=98.691, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 457.46it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=19.910, player_2/loss=79.391, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 473.03it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=22.501, player_2/loss=52.548, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 472.78it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=26.865, player_2/loss=19.443, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 466.40it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=62.571, player_2/loss=92.493, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 470.36it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=112.298, player_2/loss=175.639, rew=-13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #13: 1025it [00:02, 469.16it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=127.117, player_2/loss=197.445, rew=-16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #14: 1025it [00:02, 468.19it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=87.127, player_2/loss=173.197, rew=-18.75]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #15: 1025it [00:02, 456.97it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=51.952, player_2/loss=130.637, rew=-10.71]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #16: 1025it [00:02, 470.17it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=70.356, player_2/loss=58.568, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #17: 1025it [00:02, 470.79it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=109.298, player_2/loss=83.625, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #18: 1025it [00:02, 465.65it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=110.754, player_2/loss=71.339, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #19: 1025it [00:02, 471.85it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=66.366, player_2/loss=47.358, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #1: 1025it [00:02, 473.75it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=140.766, player_2/loss=322.176, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 462.74it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=105.359, player_2/loss=301.568, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 472.34it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=71.470, player_2/loss=247.184, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 474.99it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=100.386, player_2/loss=158.851, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 473.44it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_2/loss=153.156, rew=12.50]         


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 467.12it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=103.482, player_2/loss=128.810, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 467.22it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=90.822, player_2/loss=180.730, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 472.14it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=16.868, player_2/loss=251.328, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 469.74it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=15.902, player_2/loss=220.300, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 464.67it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=16.738, player_2/loss=217.618, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 471.76it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=54.892, player_2/loss=208.252, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 471.48it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=60.407, player_2/loss=173.254, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 471.25it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=17.792, player_2/loss=181.050, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 469.48it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=9.012, player_2/loss=218.151, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 477.36it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=10.518, player_2/loss=226.082, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 472.69it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=10.570, player_2/loss=225.324, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 469.67it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=11.547, player_2/loss=225.922, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 472.66it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=13.209, player_2/loss=255.079, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 472.60it/s, env_step=19456, len=15, n/ep=5, n/st=64, player_1/loss=12.406, player_2/loss=233.907, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 469.70it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=7.745, player_2/loss=162.036, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 469.99it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=9.910, player_2/loss=153.265, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 472.37it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=16.113, player_2/loss=127.959, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 467.02it/s, env_step=4096, len=16, n/ep=5, n/st=64, player_1/loss=15.598, player_2/loss=98.223, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 464.92it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=9.862, player_2/loss=47.417, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 466.95it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=8.729, rew=-25.00]          


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 473.81it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=14.125, player_2/loss=73.524, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 474.14it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=43.927, player_2/loss=97.760, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 471.74it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=42.443, player_2/loss=73.875, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 473.04it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=74.073, player_2/loss=72.878, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 464.76it/s, env_step=11264, len=18, n/ep=5, n/st=64, player_1/loss=93.703, player_2/loss=58.603, rew=-5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:02, 473.06it/s, env_step=12288, len=8, n/ep=9, n/st=64, player_1/loss=178.603, player_2/loss=117.481, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:02, 473.39it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=453.317, player_2/loss=150.898, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:02, 468.93it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=494.927, player_2/loss=214.480, rew=16.67]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:02, 467.90it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=369.212, player_2/loss=204.334, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:02, 471.20it/s, env_step=16384, len=10, n/ep=8, n/st=64, player_1/loss=308.301, player_2/loss=116.147, rew=6.25]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:02, 472.70it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=382.465, player_2/loss=103.724, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:02, 466.99it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=409.624, player_2/loss=63.540, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:02, 464.44it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=443.988, player_2/loss=75.892, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:02, 470.43it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=46.602, player_2/loss=269.771, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 457.93it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=116.064, player_2/loss=299.144, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 468.71it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=99.609, player_2/loss=285.442, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 470.19it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=143.988, player_2/loss=432.726, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 466.52it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=156.514, player_2/loss=421.437, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 458.00it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=44.237, player_2/loss=277.072, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 473.23it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=51.506, player_2/loss=301.896, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 469.27it/s, env_step=8192, len=8, n/ep=9, n/st=64, player_1/loss=82.917, player_2/loss=497.529, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 466.33it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=110.177, player_2/loss=656.156, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 471.89it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=129.693, player_2/loss=597.382, rew=13.89]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 469.16it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=109.769, player_2/loss=568.695, rew=10.71]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 477.15it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=119.071, player_2/loss=631.217, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 464.55it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=110.082, player_2/loss=684.493, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 469.66it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=57.908, player_2/loss=683.780, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 467.70it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=46.600, player_2/loss=732.689, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 463.39it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=46.834, player_2/loss=675.978, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 467.36it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=66.151, player_2/loss=615.121, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 466.96it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=57.440, player_2/loss=496.280, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 466.96it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=29.218, rew=13.89]         


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 452.84it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=115.034, player_2/loss=510.781, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 459.22it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=92.494, player_2/loss=419.473, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 470.31it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=142.901, player_2/loss=324.893, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 471.61it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=96.560, player_2/loss=278.162, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 470.72it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=27.104, player_2/loss=206.660, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 468.80it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=31.750, player_2/loss=165.165, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 479.49it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=103.975, player_2/loss=103.148, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 466.99it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=123.185, player_2/loss=49.453, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 474.74it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=126.056, player_2/loss=31.261, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 476.56it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=129.062, player_2/loss=48.057, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 472.33it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=110.733, player_2/loss=51.131, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 471.81it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=133.859, player_2/loss=34.790, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 493.66it/s, env_step=13312, len=24, n/ep=2, n/st=64, player_1/loss=113.236, player_2/loss=21.870, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 475.11it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=126.960, player_2/loss=34.056, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 466.58it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=140.662, player_2/loss=31.453, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 478.02it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=179.039, player_2/loss=46.908, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 473.26it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=170.608, player_2/loss=55.100, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 471.04it/s, env_step=18432, len=25, n/ep=3, n/st=64, player_1/loss=131.101, player_2/loss=108.020, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 470.97it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=128.164, player_2/loss=157.590, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 471.53it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=108.451, player_2/loss=70.423, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 471.27it/s, env_step=2048, len=11, n/ep=4, n/st=64, player_1/loss=86.159, player_2/loss=85.132, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 464.41it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=69.384, player_2/loss=112.385, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 472.59it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=39.663, player_2/loss=113.117, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 470.34it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=44.536, player_2/loss=83.486, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 474.25it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=49.884, player_2/loss=81.394, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 476.33it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=44.525, player_2/loss=98.551, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 466.96it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=36.461, player_2/loss=88.106, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 472.52it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=51.171, player_2/loss=91.848, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 460.26it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=71.577, player_2/loss=89.801, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 469.33it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=63.383, player_2/loss=134.065, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 470.43it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=39.243, player_2/loss=180.758, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 471.06it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=51.786, player_2/loss=168.397, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 471.60it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=35.593, player_2/loss=159.296, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 472.80it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=9.090, player_2/loss=171.867, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 473.76it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=65.166, player_2/loss=223.896, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 459.83it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=54.698, player_2/loss=222.141, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 473.04it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=28.067, player_2/loss=184.834, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 468.78it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=18.007, player_2/loss=181.161, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 467.25it/s, env_step=1024, len=11, n/ep=7, n/st=64, player_1/loss=64.709, player_2/loss=206.776, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 461.48it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=260.449, player_2/loss=220.104, rew=-19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 474.34it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=518.891, player_2/loss=174.707, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 475.08it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=558.685, rew=25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 462.37it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=551.329, player_2/loss=91.911, rew=5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 472.18it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=347.450, player_2/loss=137.156, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 469.56it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=312.381, player_2/loss=130.947, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 471.18it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=325.247, player_2/loss=86.199, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 460.55it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=288.561, player_2/loss=115.279, rew=-15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 472.55it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_2/loss=130.437, rew=16.67]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 473.47it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=280.382, player_2/loss=105.129, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 459.97it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=326.697, player_2/loss=119.436, rew=-15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 465.53it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=327.454, player_2/loss=60.165, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 469.37it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=342.634, player_2/loss=41.366, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 476.73it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_2/loss=51.569, rew=15.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 476.05it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=328.774, player_2/loss=28.167, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 468.27it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_2/loss=54.588, rew=15.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 470.38it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=393.981, player_2/loss=58.437, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 463.45it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=358.497, player_2/loss=15.695, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 472.15it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=164.405, player_2/loss=388.363, rew=16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 466.58it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=92.016, player_2/loss=349.487, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 475.17it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=110.535, player_2/loss=364.154, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 469.69it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=164.344, player_2/loss=322.931, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 464.43it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=124.796, player_2/loss=278.761, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 471.63it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=57.699, player_2/loss=422.252, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 462.37it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=55.985, player_2/loss=532.087, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 442.61it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=75.220, player_2/loss=502.460, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 461.12it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=59.381, rew=25.00]           


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 487.08it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=19.328, player_2/loss=604.546, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 471.80it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=16.408, player_2/loss=633.737, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 467.14it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=6.889, player_2/loss=652.982, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 463.79it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=28.920, player_2/loss=585.236, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 470.00it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=46.391, player_2/loss=451.484, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 469.57it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=57.996, player_2/loss=525.630, rew=16.67]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 470.17it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=81.991, player_2/loss=542.437, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 469.02it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=48.315, player_2/loss=459.140, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 471.51it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=37.468, player_2/loss=400.663, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 469.31it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=85.166, player_2/loss=351.841, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 470.10it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=62.891, player_2/loss=406.965, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 460.12it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=49.237, player_2/loss=344.150, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 470.00it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=162.136, player_2/loss=257.092, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 471.25it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=383.554, player_2/loss=135.108, rew=18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 471.70it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=379.183, rew=18.75]          


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 470.17it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=345.842, player_2/loss=78.229, rew=10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 465.51it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=432.185, player_2/loss=88.951, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 473.23it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=462.476, player_2/loss=82.271, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 467.38it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=385.863, player_2/loss=93.052, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 470.82it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=373.443, player_2/loss=64.737, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 463.94it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=367.199, player_2/loss=19.515, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 471.98it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=307.335, player_2/loss=100.303, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 469.94it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=297.890, player_2/loss=166.069, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 473.85it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=356.035, player_2/loss=97.003, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 471.11it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=430.105, player_2/loss=119.602, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 465.91it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=436.950, player_2/loss=69.116, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 474.00it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=390.787, player_2/loss=67.983, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 470.45it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=433.087, player_2/loss=101.918, rew=18.75]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 465.42it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=458.143, player_2/loss=69.687, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 467.83it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=268.837, player_2/loss=116.399, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 476.77it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=194.383, player_2/loss=242.748, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 475.90it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=70.901, player_2/loss=374.047, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 460.19it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=48.715, player_2/loss=378.239, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 470.84it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=44.785, player_2/loss=429.188, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 471.53it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=47.537, player_2/loss=402.784, rew=6.25]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 470.61it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=58.457, player_2/loss=369.727, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 469.96it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=62.211, player_2/loss=354.987, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 468.13it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=94.735, player_2/loss=372.680, rew=2.78]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 445.54it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=138.985, player_2/loss=388.648, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 429.22it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=118.900, player_2/loss=367.774, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 466.76it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=52.071, player_2/loss=361.225, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 467.60it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=20.982, player_2/loss=355.393, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 470.48it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=28.512, player_2/loss=367.573, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 470.10it/s, env_step=15360, len=8, n/ep=9, n/st=64, player_1/loss=27.678, player_2/loss=358.330, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 472.37it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=19.157, player_2/loss=393.483, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 468.17it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=12.472, player_2/loss=360.650, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 461.95it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=17.638, player_2/loss=347.624, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 470.00it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=13.423, player_2/loss=360.327, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 471.38it/s, env_step=1024, len=23, n/ep=2, n/st=64, player_1/loss=49.175, player_2/loss=198.154, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 468.30it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=146.695, player_2/loss=152.743, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 469.11it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=219.816, player_2/loss=138.602, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 471.95it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=211.739, player_2/loss=150.211, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 473.61it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=192.968, player_2/loss=94.927, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 457.78it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=295.612, player_2/loss=104.933, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 482.78it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=399.519, player_2/loss=63.472, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 472.98it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=308.113, player_2/loss=30.052, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 469.75it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=240.798, player_2/loss=16.677, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 471.54it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=302.767, player_2/loss=49.523, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 460.24it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=394.699, player_2/loss=66.166, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 387.55it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=306.610, player_2/loss=26.351, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 459.89it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=256.016, player_2/loss=36.215, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 475.05it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=202.045, player_2/loss=41.442, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 470.63it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=289.442, player_2/loss=19.142, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 461.63it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=313.910, player_2/loss=29.993, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 475.21it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=325.104, player_2/loss=38.417, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 472.56it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_2/loss=129.298, rew=25.00]       


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 475.43it/s, env_step=19456, len=20, n/ep=4, n/st=64, player_1/loss=117.801, player_2/loss=165.655, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 461.51it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=24.270, player_2/loss=304.052, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 475.01it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=51.202, player_2/loss=211.166, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 471.25it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=36.349, player_2/loss=145.552, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 454.75it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=33.187, player_2/loss=89.769, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 470.18it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=32.644, player_2/loss=120.667, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 478.54it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=43.483, player_2/loss=140.931, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 474.20it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=126.076, player_2/loss=108.300, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 465.08it/s, env_step=8192, len=30, n/ep=3, n/st=64, player_1/loss=138.010, player_2/loss=115.920, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 478.55it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=76.563, player_2/loss=87.767, rew=-5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 476.34it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=46.519, player_2/loss=97.878, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 461.74it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=93.033, player_2/loss=134.972, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 472.52it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=91.770, player_2/loss=257.233, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 477.30it/s, env_step=13312, len=7, n/ep=10, n/st=64, player_1/loss=65.945, player_2/loss=356.294, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 472.60it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=78.974, player_2/loss=367.361, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 462.35it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=40.610, player_2/loss=341.751, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 470.39it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=21.597, player_2/loss=348.705, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 466.12it/s, env_step=17408, len=7, n/ep=6, n/st=64, player_1/loss=52.153, player_2/loss=368.636, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 471.03it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=56.647, player_2/loss=360.579, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 472.72it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=18.040, player_2/loss=367.974, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 468.46it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=24.039, player_2/loss=323.166, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 477.33it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=13.765, player_2/loss=291.563, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 451.44it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=14.113, player_2/loss=258.798, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 470.19it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=52.672, player_2/loss=190.625, rew=-13.89]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 479.94it/s, env_step=5120, len=11, n/ep=8, n/st=64, player_1/loss=52.973, player_2/loss=166.814, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 475.31it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=30.850, player_2/loss=143.952, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 468.50it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=21.027, player_2/loss=114.136, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 473.30it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=18.175, rew=-13.89]          


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 472.07it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=32.339, player_2/loss=108.370, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 457.62it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=28.583, player_2/loss=86.877, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 472.59it/s, env_step=11264, len=8, n/ep=9, n/st=64, player_1/loss=44.177, player_2/loss=61.144, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 471.20it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=27.114, player_2/loss=72.660, rew=-13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 463.99it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=45.416, player_2/loss=49.773, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 467.92it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=54.619, player_2/loss=60.029, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 470.85it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=64.085, player_2/loss=116.032, rew=-19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 471.25it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=30.401, player_2/loss=112.713, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 471.06it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=65.799, player_2/loss=86.906, rew=-18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 463.41it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=61.117, player_2/loss=66.483, rew=-12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #18


Epoch #19: 1025it [00:02, 470.20it/s, env_step=19456, len=33, n/ep=2, n/st=64, player_1/loss=62.603, player_2/loss=120.287, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #18


Epoch #1: 1025it [00:02, 467.97it/s, env_step=1024, len=24, n/ep=2, n/st=64, player_1/loss=148.288, player_2/loss=113.783, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 474.16it/s, env_step=2048, len=31, n/ep=2, n/st=64, player_1/loss=102.201, player_2/loss=105.710, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 475.08it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=86.808, player_2/loss=101.775, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 486.95it/s, env_step=4096, len=25, n/ep=3, n/st=64, player_1/loss=78.996, player_2/loss=104.412, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 464.04it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=82.145, player_2/loss=79.324, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 469.56it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=93.159, player_2/loss=111.279, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 473.77it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=75.519, player_2/loss=116.179, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 467.85it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=58.483, player_2/loss=104.330, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 472.87it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=113.382, player_2/loss=184.644, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 469.08it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=113.570, player_2/loss=227.551, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 467.94it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=79.705, player_2/loss=149.138, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 464.94it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=45.299, player_2/loss=123.057, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 472.74it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=35.477, player_2/loss=108.725, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 469.99it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=42.061, player_2/loss=118.644, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 471.28it/s, env_step=15360, len=15, n/ep=5, n/st=64, player_1/loss=26.780, player_2/loss=98.254, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 468.04it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=51.033, player_2/loss=117.882, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 475.88it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=85.836, player_2/loss=123.113, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 465.21it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=61.512, player_2/loss=115.419, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 461.47it/s, env_step=19456, len=12, n/ep=4, n/st=64, player_1/loss=24.987, player_2/loss=127.231, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 467.23it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=9.833, player_2/loss=93.891, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 472.98it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=17.012, player_2/loss=86.163, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 471.48it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=26.636, player_2/loss=61.344, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 473.25it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=69.876, player_2/loss=64.281, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 469.95it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=48.629, player_2/loss=66.810, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 475.64it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=10.417, player_2/loss=40.016, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 464.33it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=22.802, rew=-25.00]         


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 469.73it/s, env_step=8192, len=10, n/ep=8, n/st=64, player_1/loss=23.898, player_2/loss=47.004, rew=-18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 471.92it/s, env_step=9216, len=10, n/ep=5, n/st=64, player_2/loss=35.271, rew=-25.00]         


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 472.07it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=87.473, player_2/loss=98.068, rew=-10.71]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 471.08it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=162.024, player_2/loss=155.473, rew=-19.44]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 472.33it/s, env_step=12288, len=9, n/ep=8, n/st=64, player_1/loss=156.442, player_2/loss=196.983, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 474.11it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=110.223, player_2/loss=206.870, rew=-18.75]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 465.31it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=144.187, player_2/loss=122.129, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 466.10it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=175.757, player_2/loss=88.724, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 468.91it/s, env_step=16384, len=9, n/ep=8, n/st=64, player_1/loss=161.998, player_2/loss=81.376, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 470.51it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=136.419, rew=-25.00]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 471.52it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=134.734, player_2/loss=57.551, rew=-10.71]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 474.27it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=154.960, player_2/loss=63.406, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 465.64it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=37.999, player_2/loss=178.992, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 463.79it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=63.943, player_2/loss=147.806, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 455.50it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=38.447, player_2/loss=117.989, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 468.67it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=58.844, player_2/loss=105.677, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 465.40it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=99.141, player_2/loss=131.397, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 471.07it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=82.797, player_2/loss=129.896, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 466.79it/s, env_step=7168, len=8, n/ep=6, n/st=64, player_1/loss=53.226, player_2/loss=119.081, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 465.49it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=63.203, player_2/loss=122.638, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 468.90it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=81.245, player_2/loss=114.355, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 454.07it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=40.070, player_2/loss=139.161, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 468.55it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=13.158, player_2/loss=123.303, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 462.34it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=22.797, player_2/loss=134.066, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 469.01it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=28.037, player_2/loss=153.155, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 464.00it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=41.037, player_2/loss=193.064, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 463.39it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=29.339, player_2/loss=157.947, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 465.57it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=10.244, player_2/loss=156.801, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 458.69it/s, env_step=17408, len=8, n/ep=9, n/st=64, player_1/loss=30.933, player_2/loss=144.230, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 468.94it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=29.114, player_2/loss=146.585, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 465.28it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=38.387, player_2/loss=148.971, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 473.85it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=48.675, player_2/loss=138.743, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 473.26it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=57.612, player_2/loss=206.724, rew=-15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 471.64it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=241.787, player_2/loss=202.490, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 459.06it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=334.099, rew=-18.75]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 472.36it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=231.482, player_2/loss=120.451, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 472.14it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=237.099, player_2/loss=91.489, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 469.58it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=279.401, player_2/loss=46.721, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 453.73it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=403.766, player_2/loss=68.608, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 380.65it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=487.735, player_2/loss=67.416, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 457.82it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=567.223, player_2/loss=40.895, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 450.73it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=588.104, player_2/loss=33.398, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 390.34it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=562.242, player_2/loss=29.721, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 462.95it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_2/loss=20.760, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 462.38it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=526.736, player_2/loss=20.932, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 463.62it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=492.864, player_2/loss=19.342, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 457.22it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=586.501, player_2/loss=51.003, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 462.92it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=677.224, player_2/loss=49.164, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 449.50it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=653.028, player_2/loss=14.673, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 461.53it/s, env_step=19456, len=10, n/ep=5, n/st=64, player_1/loss=517.176, player_2/loss=40.496, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 441.24it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=201.595, player_2/loss=16.491, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 441.45it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=182.534, player_2/loss=32.814, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 452.43it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=158.050, player_2/loss=73.690, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 458.72it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=157.251, player_2/loss=223.124, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 469.17it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=118.452, player_2/loss=281.295, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 455.48it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=133.501, player_2/loss=163.080, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 458.18it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=132.920, player_2/loss=341.197, rew=6.25]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 461.90it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=147.077, player_2/loss=531.767, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 467.23it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=112.157, player_2/loss=725.909, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 445.65it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=37.264, player_2/loss=692.223, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 437.82it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=24.107, player_2/loss=607.783, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 460.77it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=25.531, player_2/loss=628.809, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 442.14it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=9.926, player_2/loss=651.145, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 461.26it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=75.365, player_2/loss=611.613, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 455.33it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=93.646, player_2/loss=669.452, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 432.69it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=41.486, player_2/loss=635.135, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 409.76it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=41.025, player_2/loss=745.981, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 457.01it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=24.434, player_2/loss=678.575, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 475.01it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=28.523, player_2/loss=616.332, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 472.99it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=13.011, player_2/loss=462.436, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 484.55it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=49.667, player_2/loss=365.083, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 482.82it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=73.270, player_2/loss=258.496, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 442.42it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_2/loss=219.254, rew=0.00]          


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 432.09it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=62.321, player_2/loss=229.590, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 464.21it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=40.850, player_2/loss=185.169, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 465.29it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=37.290, player_2/loss=157.505, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 439.58it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=79.953, player_2/loss=152.609, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 442.19it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=116.748, player_2/loss=130.520, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 469.87it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=82.343, player_2/loss=124.209, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 471.13it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=40.506, player_2/loss=74.062, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 453.54it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=60.256, player_2/loss=79.113, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 456.36it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=101.143, player_2/loss=119.356, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 472.32it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=63.536, player_2/loss=122.846, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 464.88it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=72.004, player_2/loss=84.996, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 473.16it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=113.195, player_2/loss=53.065, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 470.00it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=85.830, player_2/loss=56.375, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 473.67it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=41.409, player_2/loss=40.406, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 471.61it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=35.404, player_2/loss=23.844, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 471.09it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=23.614, player_2/loss=37.782, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 474.52it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=113.837, player_2/loss=31.439, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 471.50it/s, env_step=3072, len=22, n/ep=2, n/st=64, player_1/loss=132.759, player_2/loss=64.622, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 471.25it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=64.046, player_2/loss=109.342, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 465.97it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=43.955, player_2/loss=93.352, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 466.91it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=61.413, player_2/loss=99.153, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 475.19it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=64.743, player_2/loss=126.625, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 451.13it/s, env_step=8192, len=9, n/ep=8, n/st=64, player_1/loss=38.473, player_2/loss=182.456, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 469.34it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=37.684, player_2/loss=172.720, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 466.34it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=59.724, player_2/loss=192.886, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 454.38it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=111.831, player_2/loss=193.256, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 458.03it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=109.562, player_2/loss=201.913, rew=3.57]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 469.31it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=111.358, player_2/loss=202.732, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 467.93it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=106.147, player_2/loss=226.809, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 415.43it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=62.531, player_2/loss=200.861, rew=19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 453.93it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=76.706, player_2/loss=194.347, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 483.47it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=106.622, player_2/loss=160.310, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 442.62it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=110.002, player_2/loss=179.805, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 421.48it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=118.659, player_2/loss=162.107, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 467.80it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=68.347, player_2/loss=37.912, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 488.76it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=65.698, player_2/loss=90.558, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 473.46it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=88.105, player_2/loss=142.753, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 465.29it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=98.214, player_2/loss=192.117, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 469.45it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=64.414, player_2/loss=125.506, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 470.44it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=41.886, player_2/loss=94.817, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 469.76it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=34.455, player_2/loss=107.339, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 466.44it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=26.725, player_2/loss=57.226, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 466.12it/s, env_step=9216, len=25, n/ep=3, n/st=64, player_1/loss=65.502, player_2/loss=54.320, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 474.06it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=73.227, player_2/loss=95.909, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 481.07it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=114.344, player_2/loss=139.949, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 486.24it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=116.788, player_2/loss=92.591, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 465.10it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=66.175, player_2/loss=68.370, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 461.81it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=43.933, player_2/loss=60.403, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 460.29it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=59.976, player_2/loss=117.308, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 475.79it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=72.037, player_2/loss=162.975, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 490.02it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=47.289, player_2/loss=117.365, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 474.45it/s, env_step=18432, len=22, n/ep=4, n/st=64, player_1/loss=72.459, player_2/loss=122.877, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 485.00it/s, env_step=19456, len=26, n/ep=2, n/st=64, player_1/loss=97.140, player_2/loss=123.367, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 458.34it/s, env_step=1024, len=28, n/ep=2, n/st=64, player_1/loss=135.629, player_2/loss=74.710, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 467.93it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=98.915, player_2/loss=71.023, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 478.59it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=169.683, player_2/loss=80.449, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 478.56it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=157.565, rew=8.33]          


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 458.49it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=123.088, player_2/loss=110.125, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 475.93it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=123.416, player_2/loss=104.044, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 467.58it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=76.500, player_2/loss=115.423, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 458.32it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=59.203, player_2/loss=115.905, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 459.81it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=81.759, player_2/loss=116.989, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 463.40it/s, env_step=10240, len=27, n/ep=2, n/st=64, player_1/loss=67.678, player_2/loss=115.329, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 471.06it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=86.265, player_2/loss=99.472, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 452.11it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=101.472, player_2/loss=105.344, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 466.36it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=67.909, player_2/loss=101.694, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 469.66it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=24.273, player_2/loss=120.348, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 457.79it/s, env_step=15360, len=26, n/ep=3, n/st=64, player_1/loss=102.019, player_2/loss=152.330, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 464.66it/s, env_step=16384, len=25, n/ep=2, n/st=64, player_1/loss=129.989, player_2/loss=114.388, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 468.26it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=82.560, rew=8.33]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 456.26it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=43.284, player_2/loss=124.704, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 457.77it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=59.475, player_2/loss=140.152, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 451.95it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=31.592, player_2/loss=136.741, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 456.35it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_2/loss=96.681, rew=-8.33]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 481.65it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=35.706, player_2/loss=77.191, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 482.15it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_2/loss=146.427, rew=-25.00]        


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 481.94it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=77.847, player_2/loss=150.146, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 424.92it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=73.540, player_2/loss=114.556, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 474.79it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=89.420, player_2/loss=105.349, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 435.08it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=148.028, player_2/loss=77.898, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 456.46it/s, env_step=9216, len=25, n/ep=2, n/st=64, player_1/loss=112.964, player_2/loss=77.028, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 454.18it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=70.778, player_2/loss=83.606, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 448.65it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=60.789, player_2/loss=68.425, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 477.39it/s, env_step=12288, len=16, n/ep=3, n/st=64, player_1/loss=112.589, player_2/loss=90.811, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 456.22it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=211.597, player_2/loss=194.374, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 475.10it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=288.048, player_2/loss=127.548, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 474.69it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=332.512, player_2/loss=71.224, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 462.37it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=294.014, player_2/loss=49.353, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 429.43it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=271.017, player_2/loss=17.824, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 429.49it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=260.636, player_2/loss=17.677, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 453.57it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=293.262, player_2/loss=34.373, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 421.68it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=310.980, player_2/loss=462.394, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 424.60it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=199.985, player_2/loss=525.542, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 447.70it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=113.709, player_2/loss=674.363, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 449.91it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=113.035, player_2/loss=648.062, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 443.63it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=117.031, player_2/loss=533.002, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 462.78it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=85.931, player_2/loss=475.729, rew=2.78]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 460.85it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=69.805, player_2/loss=625.739, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 458.86it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=47.295, player_2/loss=568.939, rew=13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 435.70it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=82.231, player_2/loss=461.839, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 442.93it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=122.190, player_2/loss=556.284, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 457.46it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_2/loss=508.428, rew=19.44]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 447.63it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=152.751, player_2/loss=468.892, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 443.29it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=185.009, player_2/loss=379.731, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 425.41it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=128.027, player_2/loss=494.589, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 432.40it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=91.273, player_2/loss=569.443, rew=19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 441.67it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=109.229, player_2/loss=587.885, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 441.59it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=63.343, player_2/loss=624.027, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 448.28it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=73.635, player_2/loss=525.231, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 453.52it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=84.246, player_2/loss=530.582, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 465.06it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=93.184, player_2/loss=490.546, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 467.59it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=48.177, player_2/loss=434.569, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 469.48it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=83.579, player_2/loss=373.410, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 469.26it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=57.550, player_2/loss=302.937, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 475.85it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=32.290, player_2/loss=259.525, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 469.57it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=109.342, player_2/loss=267.084, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 471.24it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=304.078, player_2/loss=228.967, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 458.19it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=385.926, player_2/loss=100.537, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 467.21it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=323.287, player_2/loss=48.647, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 471.53it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=281.521, player_2/loss=57.570, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 471.85it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=263.686, player_2/loss=36.309, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 465.02it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=242.403, player_2/loss=23.748, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 463.57it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=209.114, player_2/loss=15.183, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 370.15it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=264.369, player_2/loss=17.435, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 438.04it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=309.152, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 433.62it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=310.084, player_2/loss=25.743, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 435.71it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=282.401, player_2/loss=26.673, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 443.36it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=299.013, player_2/loss=20.963, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 416.72it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=275.119, player_2/loss=6.191, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 443.32it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=188.817, player_2/loss=4.385, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 442.73it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=169.444, player_2/loss=57.227, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 449.57it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=122.651, player_2/loss=192.436, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 470.43it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=113.253, player_2/loss=185.289, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 430.75it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=121.930, player_2/loss=139.769, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 453.41it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=123.944, player_2/loss=257.141, rew=-5.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 464.84it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=131.861, player_2/loss=262.784, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 454.77it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=106.238, player_2/loss=282.791, rew=16.67]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 452.91it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=69.302, player_2/loss=439.817, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 462.31it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=116.652, player_2/loss=472.714, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 466.98it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=164.991, player_2/loss=410.872, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 463.80it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=137.834, player_2/loss=366.236, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 455.29it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=91.144, player_2/loss=337.596, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 459.06it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=98.760, player_2/loss=313.923, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 458.47it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=59.991, player_2/loss=360.734, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 452.35it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=102.653, player_2/loss=339.329, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 464.41it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=151.260, player_2/loss=281.480, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 477.63it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=78.866, player_2/loss=295.941, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 476.75it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=65.223, player_2/loss=338.335, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 453.18it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=47.659, player_2/loss=421.216, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 463.74it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=36.682, player_2/loss=328.720, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 455.38it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=68.193, player_2/loss=198.666, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 462.33it/s, env_step=4096, len=19, n/ep=2, n/st=64, player_1/loss=87.544, player_2/loss=136.641, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 462.78it/s, env_step=5120, len=15, n/ep=3, n/st=64, player_1/loss=93.942, player_2/loss=127.387, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 487.41it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=125.847, rew=25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 469.12it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=147.300, player_2/loss=115.164, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 460.36it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=126.529, player_2/loss=160.179, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 471.29it/s, env_step=9216, len=25, n/ep=2, n/st=64, player_1/loss=171.391, player_2/loss=169.280, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 482.92it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=165.491, player_2/loss=105.881, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 480.85it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=115.275, player_2/loss=76.251, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 480.08it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=113.202, player_2/loss=71.441, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 477.69it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=88.080, player_2/loss=74.000, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 489.51it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=84.245, player_2/loss=97.250, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 476.13it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=89.876, player_2/loss=108.152, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 473.74it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=93.520, player_2/loss=94.232, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 474.42it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=108.474, player_2/loss=104.315, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 459.92it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=118.915, player_2/loss=92.343, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 451.69it/s, env_step=19456, len=16, n/ep=3, n/st=64, player_1/loss=121.031, player_2/loss=111.975, rew=-8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 460.08it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=140.881, player_2/loss=154.836, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 459.13it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=105.155, player_2/loss=148.662, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 442.97it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=94.540, player_2/loss=160.186, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 483.91it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=90.286, player_2/loss=180.360, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 467.87it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=59.253, player_2/loss=192.353, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 452.44it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=52.701, player_2/loss=177.709, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 463.79it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=44.161, player_2/loss=131.045, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 466.56it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=53.973, player_2/loss=136.286, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 460.27it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=113.846, player_2/loss=194.754, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 477.88it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=139.181, player_2/loss=280.303, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 456.60it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=126.119, player_2/loss=269.607, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 460.21it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=100.211, player_2/loss=276.434, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 448.78it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=122.256, player_2/loss=319.230, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 439.57it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=72.791, player_2/loss=258.699, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 444.69it/s, env_step=15360, len=9, n/ep=8, n/st=64, player_1/loss=75.446, player_2/loss=265.414, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 447.08it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=104.917, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 446.66it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=60.195, player_2/loss=256.538, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 454.14it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=40.539, player_2/loss=232.581, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 438.02it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=44.210, player_2/loss=230.940, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 409.98it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=68.632, player_2/loss=154.035, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.30it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=80.738, player_2/loss=139.555, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 477.30it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=82.924, player_2/loss=118.968, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 408.31it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=68.470, player_2/loss=72.479, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 449.17it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=48.859, player_2/loss=103.013, rew=-19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 465.49it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=56.887, player_2/loss=132.295, rew=-17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 475.81it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=37.597, player_2/loss=93.214, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 486.04it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=9.579, player_2/loss=71.151, rew=-13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 470.81it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=38.408, player_2/loss=88.056, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 465.94it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=38.128, player_2/loss=85.317, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 468.69it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=39.941, player_2/loss=36.389, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 466.50it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=137.264, player_2/loss=151.762, rew=-6.25]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 483.92it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=208.022, player_2/loss=198.448, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 468.08it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=277.311, player_2/loss=178.410, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 486.02it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=326.497, player_2/loss=201.922, rew=-10.71]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 486.00it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=363.780, player_2/loss=129.693, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 488.50it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=295.736, player_2/loss=77.243, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 485.60it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=312.665, player_2/loss=34.909, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 488.75it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=406.006, player_2/loss=61.198, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 474.31it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=199.060, player_2/loss=135.521, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 478.55it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=169.821, player_2/loss=199.217, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 482.38it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=96.536, player_2/loss=264.328, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 474.42it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=49.783, player_2/loss=222.121, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 471.47it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=45.099, player_2/loss=200.650, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 454.86it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=69.289, player_2/loss=176.827, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 476.28it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=65.419, player_2/loss=189.640, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 464.79it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=63.986, player_2/loss=203.263, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 455.56it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=34.756, player_2/loss=224.518, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 446.96it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=39.453, player_2/loss=262.681, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 445.69it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=42.786, player_2/loss=259.569, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 457.21it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=31.233, player_2/loss=226.718, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 439.08it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=30.369, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 414.53it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=6.457, player_2/loss=241.177, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 465.62it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=37.608, player_2/loss=234.003, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 455.49it/s, env_step=16384, len=12, n/ep=4, n/st=64, player_1/loss=43.303, player_2/loss=199.276, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 459.66it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=13.800, player_2/loss=208.556, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 442.95it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=11.170, player_2/loss=191.692, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 452.97it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=11.859, player_2/loss=175.906, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 485.77it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=34.807, player_2/loss=200.265, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 456.57it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=102.382, player_2/loss=171.473, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 417.22it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=230.677, player_2/loss=123.465, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 429.96it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=214.036, player_2/loss=115.043, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 465.01it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=155.316, player_2/loss=84.483, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 449.35it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=162.876, player_2/loss=80.373, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 461.97it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=147.074, player_2/loss=69.917, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 437.79it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=170.255, player_2/loss=60.076, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 388.42it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=178.568, player_2/loss=83.347, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 472.37it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=125.999, player_2/loss=113.530, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 484.70it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=160.896, player_2/loss=99.399, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 484.64it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=177.907, player_2/loss=90.419, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 479.27it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=171.530, player_2/loss=83.319, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 483.21it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=152.566, player_2/loss=90.667, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 476.87it/s, env_step=15360, len=26, n/ep=3, n/st=64, player_1/loss=155.368, player_2/loss=149.262, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 480.16it/s, env_step=16384, len=25, n/ep=3, n/st=64, player_1/loss=142.201, player_2/loss=137.403, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 486.58it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=125.062, player_2/loss=64.678, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 480.02it/s, env_step=18432, len=26, n/ep=2, n/st=64, player_1/loss=98.247, player_2/loss=45.408, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 481.65it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=86.889, player_2/loss=71.708, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 472.39it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=184.183, player_2/loss=86.913, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 468.65it/s, env_step=2048, len=24, n/ep=2, n/st=64, player_1/loss=170.429, player_2/loss=72.115, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 469.80it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=176.599, player_2/loss=55.237, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 472.56it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=149.794, player_2/loss=86.711, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 470.84it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=62.334, player_2/loss=118.859, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 470.82it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=34.883, player_2/loss=117.358, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 474.77it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=47.437, player_2/loss=145.702, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 471.72it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=30.346, player_2/loss=133.552, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 472.64it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=43.864, player_2/loss=162.926, rew=15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 478.66it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=59.855, player_2/loss=162.033, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 473.79it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=57.218, player_2/loss=163.353, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 472.42it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=8.877, player_2/loss=168.487, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 472.79it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=38.473, player_2/loss=166.485, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 483.94it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=44.773, player_2/loss=176.655, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 480.51it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=12.844, player_2/loss=142.329, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 477.79it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=14.311, player_2/loss=129.509, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 472.05it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=14.606, player_2/loss=152.434, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 481.69it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=33.322, player_2/loss=145.252, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 475.61it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=30.271, player_2/loss=166.264, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 483.79it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=23.311, player_2/loss=193.732, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 487.96it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=17.777, player_2/loss=157.181, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 489.13it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=58.694, player_2/loss=127.456, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 488.09it/s, env_step=4096, len=15, n/ep=3, n/st=64, player_1/loss=127.459, player_2/loss=109.967, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 486.60it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=159.788, player_2/loss=89.514, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 470.92it/s, env_step=6144, len=21, n/ep=4, n/st=64, player_1/loss=174.178, player_2/loss=74.684, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 484.39it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=174.822, player_2/loss=116.951, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 488.26it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=134.071, player_2/loss=154.745, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 492.79it/s, env_step=9216, len=29, n/ep=2, n/st=64, player_1/loss=150.502, player_2/loss=102.991, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 486.74it/s, env_step=10240, len=25, n/ep=2, n/st=64, player_1/loss=131.249, player_2/loss=117.785, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 491.32it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=107.388, player_2/loss=134.308, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 484.44it/s, env_step=12288, len=32, n/ep=2, n/st=64, player_1/loss=140.534, player_2/loss=139.238, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 487.85it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=141.153, player_2/loss=100.711, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 478.34it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=104.407, player_2/loss=63.485, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 484.10it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=106.159, player_2/loss=124.266, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 486.73it/s, env_step=16384, len=34, n/ep=2, n/st=64, player_1/loss=129.703, player_2/loss=147.717, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 492.73it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=176.591, player_2/loss=88.173, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 486.35it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=187.842, player_2/loss=103.273, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 484.95it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=190.243, player_2/loss=129.304, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 474.98it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=90.866, player_2/loss=140.674, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 477.04it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=87.730, player_2/loss=136.959, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 473.59it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=100.282, player_2/loss=147.487, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 473.83it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=96.210, player_2/loss=172.774, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 467.31it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=58.087, player_2/loss=216.423, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 473.05it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=73.884, player_2/loss=242.294, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 471.11it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=82.502, player_2/loss=234.904, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 470.30it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=33.817, player_2/loss=249.206, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 469.75it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=11.762, player_2/loss=258.110, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 462.42it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=41.686, player_2/loss=280.140, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 467.61it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=42.707, player_2/loss=254.213, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 470.17it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=98.726, player_2/loss=222.257, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 475.18it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=119.495, player_2/loss=229.254, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 433.93it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=67.910, player_2/loss=217.992, rew=18.75]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 447.20it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=87.566, player_2/loss=224.218, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 467.84it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=93.914, player_2/loss=243.166, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 467.33it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=67.943, player_2/loss=253.669, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 383.31it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=83.211, player_2/loss=273.518, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 417.05it/s, env_step=19456, len=7, n/ep=10, n/st=64, player_1/loss=110.461, player_2/loss=278.702, rew=20.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 325.58it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=86.729, player_2/loss=272.830, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 391.62it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=84.959, player_2/loss=265.630, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 480.97it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=90.379, player_2/loss=218.841, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 487.64it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=72.582, player_2/loss=202.414, rew=-13.89]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 486.51it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=89.911, player_2/loss=174.717, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 490.36it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=51.354, player_2/loss=144.386, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 486.03it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=122.644, player_2/loss=159.061, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 471.02it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=142.188, player_2/loss=144.898, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 460.67it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=119.861, player_2/loss=165.787, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 372.13it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=127.547, player_2/loss=142.209, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 476.70it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=128.223, player_2/loss=172.853, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 488.34it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=97.411, player_2/loss=141.885, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 493.22it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=69.977, player_2/loss=104.993, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 489.47it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=126.287, player_2/loss=128.137, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 491.87it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=149.174, rew=-25.00]      


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 496.13it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=152.622, player_2/loss=156.790, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 494.27it/s, env_step=17408, len=20, n/ep=4, n/st=64, player_1/loss=134.821, player_2/loss=115.055, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 491.08it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=101.638, player_2/loss=123.648, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 494.69it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=108.088, player_2/loss=142.416, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 490.44it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=396.899, player_2/loss=418.654, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 486.24it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=233.464, player_2/loss=278.496, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 488.64it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=78.751, player_2/loss=107.291, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 432.34it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=104.255, player_2/loss=110.854, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 409.56it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=117.957, player_2/loss=160.337, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 452.55it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=140.522, player_2/loss=158.703, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 446.21it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=126.112, player_2/loss=114.161, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 454.54it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=121.025, player_2/loss=138.118, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 449.86it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=274.909, player_2/loss=239.356, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 437.51it/s, env_step=10240, len=10, n/ep=5, n/st=64, player_1/loss=90.144, player_2/loss=202.221, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 450.33it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=103.416, player_2/loss=143.560, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 437.63it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=94.819, player_2/loss=216.516, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 470.19it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=75.678, player_2/loss=225.610, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 436.81it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=40.703, player_2/loss=189.952, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 462.42it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=81.889, player_2/loss=153.634, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 463.29it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=108.415, player_2/loss=128.950, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 455.41it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=75.788, player_2/loss=133.610, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 457.95it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=111.028, player_2/loss=153.552, rew=-5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 451.84it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=119.681, player_2/loss=149.915, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 462.81it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=63.043, player_2/loss=92.290, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 451.47it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=53.661, player_2/loss=92.310, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 446.05it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=34.939, player_2/loss=103.109, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 457.03it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=55.195, player_2/loss=114.101, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 456.79it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=84.484, player_2/loss=102.967, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 437.09it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=154.987, player_2/loss=89.010, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 452.94it/s, env_step=7168, len=25, n/ep=3, n/st=64, player_1/loss=163.624, player_2/loss=98.099, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 460.50it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=182.221, player_2/loss=134.599, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 407.88it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=383.757, player_2/loss=112.733, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 445.84it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=517.159, player_2/loss=71.266, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 432.50it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=501.987, player_2/loss=57.007, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 459.03it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=471.653, player_2/loss=47.054, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 453.97it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=520.175, player_2/loss=42.019, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 441.59it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=436.874, player_2/loss=65.363, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 431.59it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=347.461, player_2/loss=60.656, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 454.36it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=377.791, player_2/loss=30.651, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 461.29it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=460.517, player_2/loss=68.429, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 457.77it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=418.506, player_2/loss=77.570, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 446.61it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=453.378, player_2/loss=53.575, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 448.43it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=298.102, player_2/loss=299.932, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 461.04it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=223.835, player_2/loss=430.183, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 447.01it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=124.329, player_2/loss=608.662, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 447.93it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=53.847, rew=18.75]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 471.36it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=54.857, player_2/loss=672.619, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 421.80it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=40.677, player_2/loss=650.650, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 453.17it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=35.284, player_2/loss=713.155, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 461.57it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=157.904, player_2/loss=680.869, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 446.95it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=154.700, player_2/loss=604.857, rew=2.78]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 459.44it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=176.717, player_2/loss=515.914, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 473.02it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=104.231, player_2/loss=591.914, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 470.59it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=63.501, player_2/loss=562.980, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 474.77it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=115.512, player_2/loss=539.478, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 477.33it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=105.975, player_2/loss=492.619, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 467.39it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=33.390, player_2/loss=539.496, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 454.73it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=70.644, player_2/loss=588.613, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 444.01it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=83.620, player_2/loss=457.502, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 446.77it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=71.787, player_2/loss=517.674, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 459.46it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=83.969, player_2/loss=526.197, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 456.19it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=47.755, player_2/loss=477.396, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 398.74it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=73.900, player_2/loss=303.799, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 362.00it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=103.749, player_2/loss=108.592, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 431.39it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=109.597, player_2/loss=113.238, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 469.38it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=140.566, player_2/loss=111.370, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 463.76it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=114.760, player_2/loss=149.423, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 452.56it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=110.307, player_2/loss=186.399, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 465.71it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=147.250, player_2/loss=163.289, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 472.34it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=113.664, player_2/loss=114.520, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 460.99it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=93.091, player_2/loss=94.752, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 465.19it/s, env_step=11264, len=28, n/ep=2, n/st=64, player_1/loss=56.591, player_2/loss=67.955, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 453.48it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=90.389, player_2/loss=70.371, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 473.60it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=103.772, player_2/loss=79.902, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 456.99it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=88.304, player_2/loss=119.537, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 470.63it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_2/loss=87.801, rew=-8.33]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 480.94it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=135.131, player_2/loss=75.771, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 474.31it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=133.292, player_2/loss=82.572, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 469.13it/s, env_step=18432, len=25, n/ep=3, n/st=64, player_1/loss=111.085, player_2/loss=73.318, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 461.41it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=145.359, player_2/loss=130.536, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 459.88it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=150.278, player_2/loss=190.845, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 463.02it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=129.626, player_2/loss=154.471, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 453.83it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=110.809, player_2/loss=138.189, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 384.35it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=100.946, player_2/loss=156.825, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 454.00it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=102.819, player_2/loss=163.440, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 472.34it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=126.662, player_2/loss=115.969, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 478.29it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_2/loss=123.664, rew=25.00]         


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 466.40it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=72.067, player_2/loss=148.967, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 442.61it/s, env_step=9216, len=26, n/ep=2, n/st=64, player_1/loss=92.773, player_2/loss=158.790, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 411.69it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=88.114, player_2/loss=116.595, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 430.06it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=99.925, player_2/loss=89.592, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 446.83it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=92.552, player_2/loss=87.635, rew=-8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 454.78it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=96.034, player_2/loss=115.246, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 450.55it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=284.998, player_2/loss=257.971, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 448.19it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=273.190, player_2/loss=307.221, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 449.21it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=52.875, player_2/loss=207.800, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 458.54it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=52.588, player_2/loss=182.190, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 455.15it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=53.644, player_2/loss=154.024, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 462.35it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=40.576, player_2/loss=143.856, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 473.04it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=84.311, player_2/loss=138.339, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 457.52it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=180.931, player_2/loss=148.374, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 460.88it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=191.740, player_2/loss=165.301, rew=-18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 463.11it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=264.783, player_2/loss=203.830, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 426.58it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=358.244, player_2/loss=212.498, rew=-6.25]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 433.25it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=278.456, player_2/loss=201.960, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 435.19it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=273.758, player_2/loss=121.474, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 438.06it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=324.289, player_2/loss=121.404, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 460.92it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=328.814, player_2/loss=109.594, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 448.95it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=409.695, player_2/loss=55.132, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 461.19it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=353.021, player_2/loss=65.554, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 463.84it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=335.389, player_2/loss=122.228, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 475.88it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=338.199, player_2/loss=94.075, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 478.78it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=360.760, player_2/loss=56.397, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 459.31it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=385.120, player_2/loss=55.688, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 466.21it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=348.955, player_2/loss=67.003, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 458.53it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=361.346, player_2/loss=65.718, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 447.16it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=416.206, player_2/loss=18.088, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 465.44it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=380.950, player_2/loss=10.131, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 463.86it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=223.114, player_2/loss=83.086, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 459.33it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=183.060, player_2/loss=376.584, rew=10.71]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 444.86it/s, env_step=3072, len=9, n/ep=8, n/st=64, player_1/loss=180.024, player_2/loss=633.165, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 476.29it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=203.873, player_2/loss=681.587, rew=10.71]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 457.42it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=138.338, player_2/loss=767.265, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 466.49it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=86.727, player_2/loss=750.629, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 486.95it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=51.314, player_2/loss=623.239, rew=17.86]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 472.49it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=38.096, player_2/loss=778.882, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 461.30it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=73.670, player_2/loss=651.654, rew=10.71]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 455.18it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=68.660, player_2/loss=642.222, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 477.58it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=40.013, player_2/loss=615.559, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 479.75it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=47.307, player_2/loss=678.735, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 481.21it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=36.656, player_2/loss=633.102, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 475.20it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=40.478, player_2/loss=578.366, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 458.81it/s, env_step=15360, len=11, n/ep=8, n/st=64, player_1/loss=57.872, player_2/loss=684.316, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 466.77it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=67.887, player_2/loss=780.390, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 450.06it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=61.292, player_2/loss=598.698, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 469.11it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=102.848, player_2/loss=568.841, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 461.14it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=116.048, player_2/loss=544.348, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 468.48it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=28.691, player_2/loss=373.346, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.98it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=97.145, player_2/loss=314.415, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 480.93it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=108.243, player_2/loss=236.180, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 488.35it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=65.542, player_2/loss=116.213, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 488.53it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=48.578, player_2/loss=125.670, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 482.08it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=58.395, player_2/loss=132.102, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 485.30it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=53.875, player_2/loss=111.216, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 472.81it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=127.225, player_2/loss=83.161, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 491.34it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=225.520, player_2/loss=68.879, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 472.59it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=351.259, player_2/loss=100.901, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 465.61it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=281.654, player_2/loss=150.081, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 470.62it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=133.168, player_2/loss=160.045, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 481.09it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=68.113, player_2/loss=178.986, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 469.04it/s, env_step=14336, len=28, n/ep=2, n/st=64, player_1/loss=109.499, player_2/loss=156.049, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 460.10it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=112.056, player_2/loss=104.674, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 474.67it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=82.878, player_2/loss=93.715, rew=-15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 460.95it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=120.813, player_2/loss=115.337, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 463.88it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=89.356, player_2/loss=65.625, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 461.61it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=68.443, player_2/loss=67.422, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 453.75it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=168.280, player_2/loss=137.935, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 461.09it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=109.950, player_2/loss=173.607, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 450.30it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=60.690, player_2/loss=200.362, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 453.15it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=31.755, rew=25.00]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 456.37it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=32.451, player_2/loss=233.895, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 472.64it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=48.601, player_2/loss=220.574, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 473.16it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=31.427, player_2/loss=216.111, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 417.11it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=28.117, player_2/loss=197.800, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 458.64it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=16.764, player_2/loss=197.503, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 457.39it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=41.426, player_2/loss=204.997, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 438.81it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=15.071, player_2/loss=228.654, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 466.75it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=14.368, player_2/loss=220.649, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 468.64it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=29.437, player_2/loss=205.059, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 470.42it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=31.848, player_2/loss=207.011, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 468.75it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=4.878, player_2/loss=209.660, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 464.79it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=43.530, player_2/loss=225.608, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 466.94it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=90.952, player_2/loss=225.394, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 471.70it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=53.101, player_2/loss=231.610, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 460.22it/s, env_step=19456, len=7, n/ep=10, n/st=64, player_1/loss=15.072, player_2/loss=206.625, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 483.27it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=73.643, player_2/loss=216.548, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.77it/s, env_step=2048, len=9, n/ep=9, n/st=64, player_1/loss=54.648, player_2/loss=165.457, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 474.31it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=101.415, player_2/loss=154.996, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 474.33it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=210.037, player_2/loss=142.673, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 469.09it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=255.636, player_2/loss=134.839, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 466.23it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=244.751, player_2/loss=67.082, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 481.22it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=213.453, player_2/loss=32.422, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 417.52it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=252.764, player_2/loss=43.438, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 460.90it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=247.999, player_2/loss=44.202, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 428.19it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=216.219, player_2/loss=32.698, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 471.68it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=206.263, player_2/loss=35.183, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 450.91it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=230.431, player_2/loss=102.319, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 462.38it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=290.537, player_2/loss=41.293, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 444.05it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=237.829, player_2/loss=41.415, rew=-12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 470.60it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=125.196, player_2/loss=73.100, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 472.88it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=251.803, player_2/loss=74.536, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 461.83it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=312.103, player_2/loss=68.326, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 466.80it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=242.094, player_2/loss=63.071, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 478.09it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=183.996, player_2/loss=90.277, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 477.64it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=183.872, player_2/loss=35.599, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 483.94it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=104.937, player_2/loss=128.252, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 485.37it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=85.920, player_2/loss=155.984, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 461.65it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=143.364, player_2/loss=119.263, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 476.90it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=109.237, player_2/loss=117.198, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 464.78it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=60.215, player_2/loss=203.134, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 463.30it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=29.441, player_2/loss=259.265, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 455.37it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=26.435, player_2/loss=251.644, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 475.19it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=26.003, player_2/loss=232.021, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 484.93it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=16.912, player_2/loss=239.634, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 447.14it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=29.473, player_2/loss=282.138, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 475.49it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=46.498, player_2/loss=289.141, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 475.61it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=32.167, player_2/loss=288.572, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 425.49it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=33.541, player_2/loss=295.497, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 493.70it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=26.472, player_2/loss=277.218, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 473.69it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=15.039, player_2/loss=277.731, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 469.54it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=23.405, player_2/loss=260.247, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 471.90it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=31.997, player_2/loss=269.758, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 479.37it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=11.736, player_2/loss=257.970, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 483.94it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=27.194, player_2/loss=264.824, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 488.56it/s, env_step=2048, len=23, n/ep=2, n/st=64, player_1/loss=66.214, player_2/loss=189.021, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 488.34it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=74.994, player_2/loss=87.051, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 486.64it/s, env_step=4096, len=28, n/ep=2, n/st=64, player_1/loss=89.444, player_2/loss=54.220, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 426.37it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=116.385, player_2/loss=104.736, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 459.64it/s, env_step=6144, len=29, n/ep=2, n/st=64, player_1/loss=106.272, player_2/loss=85.579, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 484.89it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=128.318, player_2/loss=86.398, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 436.93it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=156.490, player_2/loss=91.641, rew=-17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 424.70it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=284.114, player_2/loss=127.863, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 476.78it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=312.132, player_2/loss=108.924, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 480.08it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=299.897, player_2/loss=73.206, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 469.37it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=322.414, player_2/loss=79.606, rew=6.25]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 460.22it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=305.156, player_2/loss=77.212, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 487.55it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=338.034, player_2/loss=97.764, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 465.67it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=375.799, player_2/loss=101.497, rew=6.25]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 465.01it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=282.866, player_2/loss=54.553, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 459.34it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=258.232, player_2/loss=38.448, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 402.10it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=318.619, player_2/loss=52.633, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 435.27it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=308.010, player_2/loss=52.023, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 455.65it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=169.325, player_2/loss=533.377, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 465.37it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=149.411, player_2/loss=580.001, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 456.59it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=92.553, player_2/loss=710.747, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 466.33it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=60.776, player_2/loss=714.041, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 470.34it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=82.922, player_2/loss=691.169, rew=13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 458.04it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=69.239, player_2/loss=659.124, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 479.79it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=107.144, player_2/loss=592.996, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 479.73it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=49.269, player_2/loss=669.350, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 479.65it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=75.272, player_2/loss=768.452, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 461.82it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=61.058, player_2/loss=804.589, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 477.88it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=20.916, player_2/loss=718.203, rew=18.75]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 473.24it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=62.659, player_2/loss=721.862, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 456.18it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=76.590, player_2/loss=703.598, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 445.92it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=49.675, player_2/loss=696.608, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 430.60it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=22.722, player_2/loss=738.024, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 456.06it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=49.012, player_2/loss=864.477, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 464.57it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=58.150, player_2/loss=797.873, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 426.60it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=72.265, player_2/loss=795.717, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 454.58it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=47.322, player_2/loss=729.104, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 460.93it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=110.230, player_2/loss=427.853, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.32it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=150.370, player_2/loss=272.259, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 449.75it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=197.420, player_2/loss=91.610, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 458.92it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=266.491, player_2/loss=73.495, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 421.96it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=238.108, player_2/loss=76.513, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 448.63it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=228.675, player_2/loss=63.872, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 440.30it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=350.065, player_2/loss=60.264, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 462.33it/s, env_step=8192, len=10, n/ep=4, n/st=64, player_1/loss=447.131, player_2/loss=54.702, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 421.70it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=405.326, player_2/loss=31.053, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 466.00it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=254.934, player_2/loss=52.155, rew=-5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 462.96it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=231.560, player_2/loss=56.782, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 469.57it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=275.434, player_2/loss=48.136, rew=5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 463.18it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=225.021, player_2/loss=59.352, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 476.48it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=179.596, player_2/loss=76.343, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 461.74it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=184.117, player_2/loss=37.652, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 452.53it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=215.754, player_2/loss=50.314, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 477.59it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=247.404, player_2/loss=52.353, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 483.25it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=243.079, player_2/loss=46.786, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 479.92it/s, env_step=19456, len=15, n/ep=5, n/st=64, player_1/loss=204.115, player_2/loss=64.305, rew=-5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 461.27it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=188.900, player_2/loss=77.162, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 471.04it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=135.860, player_2/loss=82.725, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 461.69it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=132.577, player_2/loss=70.548, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 453.16it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=154.904, player_2/loss=94.872, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 477.16it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=170.907, player_2/loss=160.819, rew=16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 463.35it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=142.662, player_2/loss=311.368, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 462.58it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=108.637, player_2/loss=424.767, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 440.24it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=173.819, player_2/loss=418.869, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 475.85it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=143.373, player_2/loss=395.536, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 474.13it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=36.814, player_2/loss=484.622, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 471.25it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=30.171, player_2/loss=538.073, rew=6.25]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 465.70it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=39.126, player_2/loss=572.028, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 463.82it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=51.470, player_2/loss=496.865, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 461.46it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=34.988, player_2/loss=518.791, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 471.52it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=48.229, player_2/loss=510.674, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 452.32it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=44.341, player_2/loss=496.955, rew=16.67]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 458.33it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=76.825, player_2/loss=426.514, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 443.36it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=80.457, player_2/loss=491.794, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 468.04it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=15.730, player_2/loss=538.363, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 477.67it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=40.668, player_2/loss=350.116, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 478.43it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=99.675, player_2/loss=299.365, rew=-19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 470.85it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=172.969, player_2/loss=255.295, rew=-19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 463.65it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=184.940, player_2/loss=183.592, rew=-19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 460.37it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=149.244, player_2/loss=149.419, rew=-18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 465.97it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=198.806, player_2/loss=153.955, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 467.84it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=321.570, player_2/loss=94.929, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 462.42it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=489.335, player_2/loss=40.401, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 465.44it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=560.773, player_2/loss=61.754, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 460.01it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=437.918, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 466.54it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=274.195, player_2/loss=97.228, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 465.85it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=326.546, player_2/loss=18.400, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 482.33it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=410.719, player_2/loss=29.153, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 473.52it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=343.671, player_2/loss=28.136, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 476.10it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=378.676, rew=15.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 468.33it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=391.893, player_2/loss=84.624, rew=-15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 471.20it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=244.167, player_2/loss=129.513, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 463.33it/s, env_step=18432, len=26, n/ep=3, n/st=64, player_1/loss=172.780, player_2/loss=107.165, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 468.28it/s, env_step=19456, len=27, n/ep=3, n/st=64, player_1/loss=155.658, player_2/loss=81.962, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


In [16]:
####################################################
# EXPERIMENT: VIEWING THE BEST LEARNED POLICY
####################################################

# Get the environment settings
env = get_env()
observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
state_shape = observation_space.shape or observation_space.n
action_shape = env.action_space.shape or env.action_space.n

# Configure the best agent
best_agent1 = cf_custom_dqn_policy(state_shape= state_shape,
                                   action_shape= action_shape)
best_agent1.load_state_dict(torch.load("./saved_variables/paper_notebooks/8/7-20epoch_500loop/looping-iteration-497/best_policy_agent1.pth"))
best_agent1.set_eps(0)


best_agent2 = cf_custom_dqn_policy(state_shape= state_shape,
                                   action_shape= action_shape)
best_agent2.load_state_dict(torch.load("./saved_variables/paper_notebooks/8/7-20epoch_500loop/looping-iteration-498/best_policy_agent2.pth"))
best_agent2.set_eps(0)

# Watch the best agent at work
watch(numer_of_games= 3,
      render_speed= 0.3,
      agent_player1= best_agent1,
      agent_player2= best_agent2)



Average steps of game:  12.0
Final mean reward agent 1: -25.0, std: 0.0
Final mean reward agent 2: 25.0, std: 0.0


In [15]:
####################################################
# EXPERIMENT: VIEWING THE LAST LEARNED POLICY
####################################################

# Configure the final agent
final_agent_player1 = cf_custom_dqn_policy(state_shape= state_shape,
                                           action_shape= action_shape)
final_agent_player1.load_state_dict(torch.load("./saved_variables/paper_notebooks/8/7-20epoch_500loop/looping-iteration-497/final_policy_agent1.pth"))
best_agent1.set_eps(0)

final_agent_player2 = cf_custom_dqn_policy(state_shape= state_shape,
                                           action_shape= action_shape)
final_agent_player2.load_state_dict(torch.load("./saved_variables/paper_notebooks/8/7-20epoch_500loop/looping-iteration-498/final_policy_agent2.pth"))
best_agent2.set_eps(0)

# Watch the best agent at work
watch(numer_of_games= 3,
      render_speed= 0.3,
      agent_player1= final_agent_player1,
      agent_player2= final_agent_player2)



Average steps of game:  17.0
Final mean reward agent 1: -8.333333333333334, std: 23.570226039551585
Final mean reward agent 2: 8.333333333333334, std: 23.570226039551585


<hr><hr>

## Discussion

We see that agent 1 always finds another column to stack coins such that agent 2 doesn't interfere. Likewise, agent 2 just learns to block that single column and stack on another column when it is training against the frozen agent. The agents learn to outsmart the other agent but don't learn to play connect four.

In [None]:
####################################################
# CLEAN VARIABLES
####################################################

del action_shape
del agent1
del agent2
del best_agent1
del best_agent2
del env
del final_agent_player1
del final_agent_player2
del observation_space
del off_policy_traininer_results
del state_shape
